reasoning-gym

mirror of https://github.com/open-thought/reasoning-gym.git synced 2026-04-27 17:23:19 +00:00

Author	SHA1	Message	Date
joesharratt1229	d0ef136d5b	Feat/intragen experiments (#414 ) * added curriculum * readapted readme * corrected small errors * Delete eval/eval/r1/algorithmic/word_sorting.json * removed redundant argument * added spell * removed duplicated fit * changed config * added composite changes * added composite changes * updated yaml * added spell backward * updated read me * added qwen2.5 * added * Add files via upload * updated missing trainer func * updated curr * updated spell back * updated correctness score func * updated configs * added local evals * added updates * updated datasets * added fsdp to hf utility * added algorithmic qwen 3b yaml * updated read me * updated configs * added preappend token * updated with thinking token * updated test score board * resolved comments * added evaluation scripts * removed results from pr * added config * added partial reward scoring * added evaluation composites * added training configs * added games eval * added rubriks cube * resolved merge cinflicts * added games config * added latest eval configs * updated strucutre * Delete training/evaluations/eval_graphs_composite.yaml --------- Co-authored-by: joesharratt1229 <joesharrat1229@gmail.com>	2025-04-16 08:04:52 +02:00
Oliver Stanley	224532f12a	first inter-domain generalisation experiments (#412 ) * tweak len reward * first inter-generalisation experiment config * update inter algorithmic config * default to empty config * fix typo * change config to match experiment script * long prompt fixes * algorithmic training config tweaks * imports * update algorithmic training cfgs * first logic composite config * fix dset name * tweaks * fix syllogisms dataset * rm temp print * initial algebra config * algebra cfg tweaks * add gc * add initial games cfg * rename games cfg * fix dset name * fix sokoban metadata * remove boxnet * games cfg tweak	2025-04-14 21:06:40 +01:00
Oliver Stanley	ff5407f766	fix boxnet scoring function (#420 ) * avoid error print on invalid move format in tower of hanoi * avoid boxnet error on hallucinated starting posittions	2025-04-14 20:44:24 +01:00
Zafir Stojanovski	290bfc4fdd	(evals): Medium configs (#415 ) * updated medium configs * fix problematic curriculum values / small issues causing exceptions to be raised * optimus alpha config * all configs so far * fix tests	2025-04-14 08:25:31 +02:00
Zafir Stojanovski	dced3bfc45	fix(curriculum): Make boundaries in curriculum more sensible (#407 ) * init * fix tests * unify codeio * filtered for libraries not present in reasoning-gym * fix more bounds * puzzle24 * knight swap curriculum * fix number sorting * fix attributes * add validation of config in creation of dataset * dry run for instantiating and validating the datasets * remove unused imports * fix curriculum tests to reference newly updated attribute names	2025-04-04 20:24:14 +02:00
Zafir Stojanovski	ce0a6c4878	fix(envs): Add source dataset and index to metadata (#388 ) * add source dataset and index to metadata * fix typo * fix coach class and its test	2025-03-20 11:12:14 +00:00
Oliver Stanley	7475a20700	include ranges rather than sampled values in difficulty metadata dicts (#387 ) * update difficulty metadata for logic datasets * update difficulty metadata for graph datasets * update difficulty metadata for geometry datasets * update difficulty metadata for games datasets * update difficulty metadata for cognition datasets * update difficulty metadata for arithmetic datasets * update difficulty metadata for arc datasets * update difficulty metadata for algorithmic datasets * update difficulty metadata for algebra datasets * use tuples * update tests * update tests	2025-03-20 10:27:03 +01:00
Andreas Köpf	d2c895f1d3	Refactor Curriculum Attributes (#335 ) * remove min_value from AttributeDefinition * remove type from AttributeDefinition * Add CurriculumContext * add ensure_interval option for RangeAttributes * docs: Add legend explaining curriculum indicators in dataset gallery * update GALLERY.md	2025-03-16 15:40:28 +01:00
Andreas Köpf	d6f399b8e4	Add eval configs, small fixes to eval script & rush-hour score_answer	2025-03-16 09:18:05 +01:00
Oliver Stanley	a71994ad03	add rush hour curriculum (#362 ) * add rush hour curriculum * add to __init__.py	2025-03-14 16:11:23 +01:00
Oliver Stanley	8f8bd9d756	add sudoku curriculum (#360 ) * sudoku curriculum * add test * add to __init__.py	2025-03-14 16:09:49 +01:00
joesharratt1229	fa7d8e66b3	added boxnet generator and curr (#364 )	2025-03-14 08:48:57 +01:00
Zafir Stojanovski	c81036004f	maze curriculum (#343 )	2025-03-13 21:01:48 +01:00
Rich Jones	fe0f9ae114	Fix Rush Hour (#336 ) * think it fixes rush hour * explain walls	2025-03-13 20:42:40 +01:00
joesharratt1229	3c39cbda40	added sokoban dataset (#325 )	2025-03-11 00:21:03 +01:00
joesharratt1229	e9944149bd	added tsumego curric (#323 )	2025-03-11 00:19:55 +01:00
joesharratt1229	e01910254d	added futoshiki and tower hanou (#316 ) * added futoshiki and tower hanou * corrected failed unit tests	2025-03-11 00:12:32 +01:00
joesharratt1229	30f5d823da	Curriculum/emoji mystery (#315 ) * added emoji curriculum * updated metadata * added curriculum to register	2025-03-11 00:11:27 +01:00
Zafir Stojanovski	e4b13bf51f	mini sudoku curriculum (#311 )	2025-03-10 00:29:53 +01:00
Andreas Koepf	0a35e608ec	register MahjongPuzzleCurriculum	2025-03-07 19:17:04 +01:00
Zafir Stojanovski	b915565c0d	add difficulty where possible (#274 )	2025-03-07 19:01:26 +01:00
Andreas Köpf	c69bc5d4e6	Basic curriculum (#198 ) * feat: Add optional curriculum support to dataset registration and creation * docs: Add docstrings to create_curriculum() and register_dataset() * feat: Add curriculum configuration classes for CurriculumExperiment * feat: Add weight parameter to CurriculumAttributeConfig and use in DatasetSpec * refactor: Simplify CurriculumAttributeConfig with "" attribute level support test: Add unit tests for CurriculumExperiment class * feat: Add from_yaml() method to CurriculumExperimentConfig with unit test	2025-03-07 11:22:12 +01:00
joesharratt1229	f3ee9a91a2	Added puzzle24 closes #208 (#268 ) * added puzzle24	2025-03-05 22:36:37 +01:00
joesharratt1229	e30be066ec	Fixed `countdown` `score_answer` (#265 ) * fixed countdown score ans * checked solution uses all numbers	2025-03-05 22:30:12 +01:00
Zafir Stojanovski	d0a42116fb	feat(env): Mahjong Puzzle Curriculum (#263 ) * mahjong curriculum * typo * update levels	2025-03-05 22:28:02 +01:00
Zafir Stojanovski	8ecc723607	feat(env): NQueens Curriculum (#262 ) * curriculum & tests	2025-03-05 15:05:17 +01:00
Andreas Köpf	5d7fbac0ad	Minor question template & score_answer improvements (#261 ) * math prompt improvements * ignore brackets in complex_arithmetic results * improve additional instruction in prompt of polynomial_equations * more strict tests for score_answer in polynomial_equations * simplify special reward handling * fix test_intermediate_integration * fix sokoban dataset * add common dataset score_answer consistency test	2025-03-04 21:55:09 +01:00
Zafir Stojanovski	01e1c8f9af	fix: Unify Prompts (#254 ) * remove cot * fix prompt template * fix pool matrix * spiral matrix fixed	2025-03-03 21:55:53 +01:00
Zafir Stojanovski	9c581f1be1	Mahjong Puzzle (#241 ) * mahjong	2025-03-01 16:27:26 +01:00
Andreas Koepf	6511725711	add markdown tripple backticks around tsumego board	2025-02-26 19:39:05 +01:00
Andreas Koepf	f97bf94caa	fix & simplify score_answer() of TsumegoDataset	2025-02-26 19:04:30 +01:00
Andreas Koepf	eeb9fa31d5	more native type hints	2025-02-21 21:23:14 +01:00
Andreas Koepf	51808210aa	add markdown tripple backtick code block for emoji_mystry hint	2025-02-21 21:06:07 +01:00
Andreas Köpf	c56045b9a7	Merge branch 'main' into feat/emoji-mystery	2025-02-21 20:58:39 +01:00
joesharratt1229	1fb73011f8	added answer format spec in prompt	2025-02-21 18:03:05 +00:00
joesharratt1229	5e64d1c24c	added emoji dataset	2025-02-21 17:57:41 +00:00
Andreas Köpf	b59ccdefa2	Merge pull request #178 from olliestanley/feature/unsloth-train Add minimal working GRPO training example with Unsloth	2025-02-21 15:37:24 +01:00
Andreas Koepf	3e7ff3b084	use native types List->list, Dict->dict, Set->set, Tuple->tuple	2025-02-21 15:15:38 +01:00
Oliver	e26161713e	Answer scoring fixes to address edge cases	2025-02-20 22:04:01 +00:00
Andreas Köpf	f2e86795eb	Merge pull request #161 from olliestanley/fix/sudoku-unique Fix Sudoku generator for uniqueness, implement scoring	2025-02-18 22:55:43 +01:00
Oliver	75802c6a1c	Add docstring	2025-02-18 21:38:46 +00:00
Oliver	b95ed7fbad	Remove now redundant is_valid function	2025-02-18 21:37:37 +00:00
Oliver	5003357425	Remove comment	2025-02-18 21:32:15 +00:00
Oliver	56da1f5dcf	Optimise Sudoku uniqueness checks	2025-02-18 21:30:59 +00:00
Oliver	7d8fe8b5d9	Fix Sudoku generator uniqueness and scoring	2025-02-18 21:02:49 +00:00
Oliver	473368bbea	Tweak mini sudoku config	2025-02-18 18:46:14 +00:00
Oliver	fa773115b9	Tweak mini sudoku config	2025-02-18 18:43:19 +00:00
Oliver	32928bd77b	Tweak num_empty logic	2025-02-18 18:36:12 +00:00
Oliver	9e3f4e806a	Ensure unique mini sudokus	2025-02-18 18:31:30 +00:00
Oliver	c512d86a86	Cleanup question & add scoring for mini sudoku	2025-02-17 18:37:07 +00:00

1 2 3

131 commits