reasoning-gym

mirror of https://github.com/open-thought/reasoning-gym.git synced 2026-04-24 17:05:03 +00:00

Author	SHA1	Message	Date
joesharratt1229	e304b20e24	added Decimal curriculum (#280 ) * added decimal curricula * added chain sum decimal curriculum * register DecimalArithmeticCurriculum & DecimalChainSumCurriculum --------- Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>	2025-03-07 23:02:57 +01:00
Zafir Stojanovski	dc657b5ed4	feat(env): Binary Matrix Curriculum (#279 ) * binary matrix curriculum * register BinaryMatrixCurriculum --------- Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>	2025-03-07 22:58:47 +01:00
joesharratt1229	98def56bb4	added basic arith curricula (#276 ) * added basic arith curricula * register BasicArithmeticCurriculum --------- Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>	2025-03-07 22:54:49 +01:00
Zafir Stojanovski	dfc28c94d6	feat(env): Binary Alternation Curriculum (#278 ) * binary alternation --------- Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>	2025-03-07 22:44:32 +01:00
Zafir Stojanovski	0fb90ce8c4	feat(env): Leg Counting Curriculum (#275 ) * leg counting curriculum --------- Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>	2025-03-07 19:15:18 +01:00
Andreas Köpf	c2263979bc	Basic curriculum (#198 ) * feat: Add optional curriculum support to dataset registration and creation * docs: Add docstrings to create_curriculum() and register_dataset() * feat: Add curriculum configuration classes for CurriculumExperiment * feat: Add weight parameter to CurriculumAttributeConfig and use in DatasetSpec * refactor: Simplify CurriculumAttributeConfig with "" attribute level support test: Add unit tests for CurriculumExperiment class * feat: Add from_yaml() method to CurriculumExperimentConfig with unit test	2025-03-07 11:22:12 +01:00
Rich Jones	34889d0517	Add Modulo Grid Task (#273 ) * add modulo_grid dataset * ensure the pattern is mathematical, not just spatial --------- Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>	2025-03-07 11:11:41 +01:00
Rich Jones	11c9790a25	[Env] Game of Life Halting Prediction (#272 ) This is a variant of the Game of Life task, which rather than trying to test the algorithmic simulation, tests the ability of the model to do explanatory reasoning of the board. The idea is that a model with good explanatory reasoning will be able to see that a game will not halt without simulating it into the future. The task presents a GoL board, and the model is asked to predict if the board will halt (die, all cells zero) after n steps. Sometimes, the board will be made up of 'oscillators', isolated structures which never die. Othertimes, it is filled with non-oscillators, structures which will always die after a few steps. The model should deduce which case the presented board is.	2025-03-07 10:05:12 +01:00
joesharratt1229	1893691c57	updated algorithmics dataset (#269 ) * updated algorithmic datasets * added changes to symbolic and power * updated power function test	2025-03-05 23:32:53 +01:00
Zafir Stojanovski	f843ac1b82	shortest path curriculum (#271 )	2025-03-05 22:46:10 +01:00
Zafir Stojanovski	a048084009	largest island curriculum (#270 )	2025-03-05 22:45:35 +01:00
Zafir Stojanovski	3d9bb382aa	feat(env): Count Bits Curriculum (#267 ) * add min n * count bits	2025-03-05 22:44:04 +01:00
Zafir Stojanovski	84158df1c7	feat(env): Course Schedule Curriculum (#266 ) * course schedule curriculum * update levels * update comments * lint	2025-03-05 22:42:46 +01:00
joesharratt1229	2c524c0c6f	Added puzzle24 closes #208 (#268 ) * added puzzle24	2025-03-05 22:36:37 +01:00
Oliver Stanley	3286a68361	First version of CodeI/O reasoning data (#264 ) * notebook for prepping first set of raw code files * updated codeio processing notebook for repo-level processing * fix for edge case in codeio scoring * Add reformat notebook * filtering pass * add non-determinism filtering * Tweak CodeIODataset & include first real data * add basic codeio test, metadata	2025-03-05 22:34:11 +01:00
joesharratt1229	7458dbc95d	Fixed `countdown` `score_answer` (#265 ) * fixed countdown score ans * checked solution uses all numbers	2025-03-05 22:30:12 +01:00
Zafir Stojanovski	3c544aba20	feat(env): Mahjong Puzzle Curriculum (#263 ) * mahjong curriculum * typo * update levels	2025-03-05 22:28:02 +01:00
Zafir Stojanovski	19ca54da72	feat(env): NQueens Curriculum (#262 ) * curriculum & tests	2025-03-05 15:05:17 +01:00
Andreas Köpf	b2904ccab9	Minor question template & score_answer improvements (#261 ) * math prompt improvements * ignore brackets in complex_arithmetic results * improve additional instruction in prompt of polynomial_equations * more strict tests for score_answer in polynomial_equations * simplify special reward handling * fix test_intermediate_integration * fix sokoban dataset * add common dataset score_answer consistency test	2025-03-04 21:55:09 +01:00
Rich Jones	e3b7365f50	Game of Life partial scoring and rule-clarification (#258 ) * partial scoring and rule clarification * better ql scoring * word seq reverse typos	2025-03-03 22:22:39 +01:00
vncntt	8992037ecc	fixed problems in knights_knaves (#251 ) * remove unnecessary variables * added depth logic * add depth tests	2025-03-02 08:47:54 +01:00
Andreas Köpf	ece6990709	Remove strip from ProceduralDataset::core score_answer() (#250 ) * remove strip from ProceduralDataset::core score_answer(), strip in extract answer (optional, default=True) * test: Move test_extract_answer() from test_dataset.py to test_utils.py * refactor: Improve decimal reward computation with more flexible comparison * fix: Implement rounding for format_number when round_if_needed is True * test: Add test case for compute_decimal_reward with sign and zeros	2025-03-02 08:46:36 +01:00
Zafir Stojanovski	1bc9f6f09f	fix manipulate matrix (#247 )	2025-03-01 23:00:29 +01:00
Rich Jones	80aafda8e5	more dynamic scoring for jumble (#246 )	2025-03-01 18:50:59 +01:00
Zafir Stojanovski	78c92d7056	Mahjong Puzzle (#241 ) * mahjong	2025-03-01 16:27:26 +01:00
Andreas Köpf	ed90fff3fa	Merge pull request #220 from open-thought/rich/cubeinstructions Make Rubiks Cube Output Format More Explicit	2025-02-27 12:16:09 +01:00
Rich Jones	b2b2311329	seed test config	2025-02-27 10:44:28 +01:00
Rich Jones	9daaccc208	expand more	2025-02-27 10:41:30 +01:00
Andreas Koepf (aider)	a92dcd4a75	feat: Add comprehensive unit tests for parse_string_to_complex() method	2025-02-26 21:44:32 +01:00
Andreas Koepf	2ddcb7c3c7	fix & simplify score_answer() of TsumegoDataset	2025-02-26 19:04:30 +01:00
Andreas Köpf	7c4ab296fd	Merge pull request #231 from AhmedSaif2/count-primes Fix primes representation in count_primes dataset metadata	2025-02-26 17:49:50 +01:00
Andreas Köpf	42d42aae89	Merge pull request #219 from open-thought/rich/fix_ccc Fix Cube Rotation Scoring	2025-02-26 17:41:18 +01:00
AhmedSaif2	e9e36f3a23	Fix primes representation in count_primes dataset metadata	2025-02-26 14:58:21 +02:00
Rich Jones	f2479fcacc	fix CCC scoring	2025-02-26 12:54:40 +01:00
Andreas Köpf	6b923d5ea0	Fix PoolMatrixConfigs::score_answer(), add unit tests (#215 )	2025-02-26 00:43:18 +01:00
vncntt	465db5c5c7	Add KnightsKnavesDataset (knights_knaves) Adapted code from https://github.com/AlphaPav/mem-kk-logic/blob/main/data_prep/lib_kk.py --------- Co-authored-by: Andreas Koepf (aider) <andreas.koepf@provisio.com>	2025-02-25 20:15:38 +01:00
Andreas Köpf	8b0a3e2c95	Merge pull request #191 from zafstojano/env/shortest-path feat(env): Shortest Path	2025-02-23 22:28:43 +01:00
Zafir Stojanovski	915a0f1f51	predict actual path	2025-02-23 18:24:23 +01:00
Andreas Koepf	0a487030ec	minor arc_1d tweaks	2025-02-23 16:37:40 +01:00
Andreas Koepf	f600c7eb30	add arc_1d size range test	2025-02-23 12:58:51 +01:00
Andreas Koepf	e444bbf7a1	fix index out of range of arc_1d dataset (#190 )	2025-02-23 12:51:41 +01:00
Zafir Stojanovski	df914dfb49	shortest path	2025-02-23 11:25:00 +01:00
Andreas Köpf	e41b86ec36	Merge branch 'main' into feat/emoji-mystery	2025-02-21 20:58:39 +01:00
joesharratt1229	650387f748	added tests	2025-02-21 17:58:13 +00:00
Andreas Köpf	82839dec96	Merge pull request #181 from open-thought/rich/bitwise Add Bitwise Arithmetic	2025-02-21 17:27:45 +01:00
Andreas Köpf	de362fb76f	Merge pull request #182 from zafstojano/env/binary-alternation feat(env): Binary Alternation	2025-02-21 17:27:16 +01:00
Andreas Koepf (aider)	5fb26fc709	fix: Handle negative hex number prefix variations in bitwise arithmetic test	2025-02-21 17:23:50 +01:00
Andreas Koepf (aider)	2abe783be4	test: Update bitwise arithmetic difficulty levels to [1, 2, 3]	2025-02-21 17:22:36 +01:00
Andreas Koepf (aider)	5b233ce9cc	test: Add comprehensive unit tests for BitwiseArithmeticDataset	2025-02-21 17:21:00 +01:00
Andreas Köpf	32d319e291	Merge pull request #180 from Adefioye/list-functions Add induction-based tasks for list functions	2025-02-21 16:20:49 +01:00

1 2 3 4 5 ...

423 commits