reasoning-gym

mirror of https://github.com/open-thought/reasoning-gym.git synced 2026-04-25 17:10:51 +00:00

Author	SHA1	Message	Date
joesharratt1229	e304b20e24	added Decimal curriculum (#280 ) * added decimal curricula * added chain sum decimal curriculum * register DecimalArithmeticCurriculum & DecimalChainSumCurriculum --------- Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>	2025-03-07 23:02:57 +01:00
Zafir Stojanovski	dc657b5ed4	feat(env): Binary Matrix Curriculum (#279 ) * binary matrix curriculum * register BinaryMatrixCurriculum --------- Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>	2025-03-07 22:58:47 +01:00
joesharratt1229	98def56bb4	added basic arith curricula (#276 ) * added basic arith curricula * register BasicArithmeticCurriculum --------- Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>	2025-03-07 22:54:49 +01:00
Oliver Stanley	35c32cd5e7	Tolerant scoring for CodeI/O based on edit distances (#277 ) * add zss dep * codeio edit distance-based scoring * edit distance tweaks	2025-03-07 22:49:35 +01:00
Zafir Stojanovski	dfc28c94d6	feat(env): Binary Alternation Curriculum (#278 ) * binary alternation --------- Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>	2025-03-07 22:44:32 +01:00
Andreas Koepf	ce55d528ad	register MahjongPuzzleCurriculum	2025-03-07 19:17:04 +01:00
Zafir Stojanovski	0fb90ce8c4	feat(env): Leg Counting Curriculum (#275 ) * leg counting curriculum --------- Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>	2025-03-07 19:15:18 +01:00
Zafir Stojanovski	a48ff14507	add difficulty where possible (#274 )	2025-03-07 19:01:26 +01:00
Andreas Koepf	8790c6be00	update gallery	2025-03-07 16:24:47 +01:00
Andreas Koepf	178b0bd22e	remove data/ from main .gitignore	2025-03-07 16:16:40 +01:00
Andreas Koepf	2b1f7ce5ee	use relative import for reasoning_gym.data	2025-03-07 15:56:45 +01:00
Andreas Köpf	c2263979bc	Basic curriculum (#198 ) * feat: Add optional curriculum support to dataset registration and creation * docs: Add docstrings to create_curriculum() and register_dataset() * feat: Add curriculum configuration classes for CurriculumExperiment * feat: Add weight parameter to CurriculumAttributeConfig and use in DatasetSpec * refactor: Simplify CurriculumAttributeConfig with "" attribute level support test: Add unit tests for CurriculumExperiment class * feat: Add from_yaml() method to CurriculumExperimentConfig with unit test	2025-03-07 11:22:12 +01:00
Rich Jones	34889d0517	Add Modulo Grid Task (#273 ) * add modulo_grid dataset * ensure the pattern is mathematical, not just spatial --------- Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>	2025-03-07 11:11:41 +01:00
Rich Jones	11c9790a25	[Env] Game of Life Halting Prediction (#272 ) This is a variant of the Game of Life task, which rather than trying to test the algorithmic simulation, tests the ability of the model to do explanatory reasoning of the board. The idea is that a model with good explanatory reasoning will be able to see that a game will not halt without simulating it into the future. The task presents a GoL board, and the model is asked to predict if the board will halt (die, all cells zero) after n steps. Sometimes, the board will be made up of 'oscillators', isolated structures which never die. Othertimes, it is filled with non-oscillators, structures which will always die after a few steps. The model should deduce which case the presented board is.	2025-03-07 10:05:12 +01:00
Andreas Koepf	fa1bf7910a	update gallery, pypi release, bump version	2025-03-05 23:45:45 +01:00
joesharratt1229	1893691c57	updated algorithmics dataset (#269 ) * updated algorithmic datasets * added changes to symbolic and power * updated power function test	2025-03-05 23:32:53 +01:00
Zafir Stojanovski	f843ac1b82	shortest path curriculum (#271 )	2025-03-05 22:46:10 +01:00
Zafir Stojanovski	a048084009	largest island curriculum (#270 )	2025-03-05 22:45:35 +01:00
Zafir Stojanovski	3d9bb382aa	feat(env): Count Bits Curriculum (#267 ) * add min n * count bits	2025-03-05 22:44:04 +01:00
Zafir Stojanovski	84158df1c7	feat(env): Course Schedule Curriculum (#266 ) * course schedule curriculum * update levels * update comments * lint	2025-03-05 22:42:46 +01:00
joesharratt1229	2c524c0c6f	Added puzzle24 closes #208 (#268 ) * added puzzle24	2025-03-05 22:36:37 +01:00
Oliver Stanley	3286a68361	First version of CodeI/O reasoning data (#264 ) * notebook for prepping first set of raw code files * updated codeio processing notebook for repo-level processing * fix for edge case in codeio scoring * Add reformat notebook * filtering pass * add non-determinism filtering * Tweak CodeIODataset & include first real data * add basic codeio test, metadata	2025-03-05 22:34:11 +01:00
joesharratt1229	7458dbc95d	Fixed `countdown` `score_answer` (#265 ) * fixed countdown score ans * checked solution uses all numbers	2025-03-05 22:30:12 +01:00
Zafir Stojanovski	3c544aba20	feat(env): Mahjong Puzzle Curriculum (#263 ) * mahjong curriculum * typo * update levels	2025-03-05 22:28:02 +01:00
Zafir Stojanovski	19ca54da72	feat(env): NQueens Curriculum (#262 ) * curriculum & tests	2025-03-05 15:05:17 +01:00
Andreas Köpf	b2904ccab9	Minor question template & score_answer improvements (#261 ) * math prompt improvements * ignore brackets in complex_arithmetic results * improve additional instruction in prompt of polynomial_equations * more strict tests for score_answer in polynomial_equations * simplify special reward handling * fix test_intermediate_integration * fix sokoban dataset * add common dataset score_answer consistency test	2025-03-04 21:55:09 +01:00
joesharratt1229	bf24999bb0	implemented family_relationships score ans (#260 )	2025-03-04 21:37:57 +01:00
vncntt	478646622e	should exit if API key isn't defined (#259 ) * should exit if open-router and no api key	2025-03-04 09:45:36 +01:00
Rich Jones	e3b7365f50	Game of Life partial scoring and rule-clarification (#258 ) * partial scoring and rule clarification * better ql scoring * word seq reverse typos	2025-03-03 22:22:39 +01:00
joesharratt1229	340d6a7ab9	updated for config by dataset (#257 ) * updated for config by dataset * updated read me	2025-03-03 21:58:32 +01:00
Andreas Köpf	07388767a2	Reduce precision from 28 to 6 in DecimalArithmeticDataset (#256 )	2025-03-03 21:57:08 +01:00
Andreas Köpf	17f87476a3	add Chain of Draft and direct system prompt styles (#255 )	2025-03-03 21:56:31 +01:00
Zafir Stojanovski	2f9d94c1e7	fix: Unify Prompts (#254 ) * remove cot * fix prompt template * fix pool matrix * spiral matrix fixed	2025-03-03 21:55:53 +01:00
joesharratt1229	976e1710a6	small change to word sequence reversal prompt (#252 ) corrected ansewr format	2025-03-02 17:34:35 +01:00
vncntt	8992037ecc	fixed problems in knights_knaves (#251 ) * remove unnecessary variables * added depth logic * add depth tests	2025-03-02 08:47:54 +01:00
Andreas Köpf	ece6990709	Remove strip from ProceduralDataset::core score_answer() (#250 ) * remove strip from ProceduralDataset::core score_answer(), strip in extract answer (optional, default=True) * test: Move test_extract_answer() from test_dataset.py to test_utils.py * refactor: Improve decimal reward computation with more flexible comparison * fix: Implement rounding for format_number when round_if_needed is True * test: Add test case for compute_decimal_reward with sign and zeros	2025-03-02 08:46:36 +01:00
Andreas Köpf	16a4ea1193	Revert "log error message on bad api response (#243 )" (#249 ) This reverts commit `27e66ba6dd`.	2025-03-01 23:56:42 +01:00
Andreas Köpf	1b1c04bb70	feat: Add `category` property to `ProceduralDataset` to extract category name (#248 )	2025-03-01 23:11:40 +01:00
Zafir Stojanovski	1bc9f6f09f	fix manipulate matrix (#247 )	2025-03-01 23:00:29 +01:00
Rich Jones	80aafda8e5	more dynamic scoring for jumble (#246 )	2025-03-01 18:50:59 +01:00
Zafir Stojanovski	78c92d7056	Mahjong Puzzle (#241 ) * mahjong	2025-03-01 16:27:26 +01:00
Andreas Köpf	dbd2ac723e	Add base_url and api_key command line args for eval.py script (#244 ) * feat: Add base URL command line parameter to eval.py script * feat: Add API key parameter and CLI option to AsyncModelEvaluator	2025-02-28 18:32:58 +01:00
Rich Jones	27e66ba6dd	log error message on bad api response (#243 )	2025-02-28 15:32:27 +01:00
Andreas Köpf	59922486c6	Eval sampling settings for generation (temperature, top-p, max_tokens) (#242 ) * feat: Add sampling parameters to eval configuration and API call * feat: Add support for system_prompt_id and optional system_prompt configuration	2025-02-28 11:48:37 +01:00
Andreas Koepf	d83e53115a	fix prompt for arc_1d	2025-02-28 08:07:59 +01:00
Andreas Koepf (aider)	82e79d672e	feat: Add system prompt to dataset results and summary output	2025-02-28 00:26:06 +01:00
Andreas Köpf	0b108efac1	Generate eval config tool (#240 ) * feat: Add generate_config.py script to create eval configurations	2025-02-27 21:40:53 +01:00
Andreas Köpf	1ea9a657a7	Eval script consolidation (#238 ) The script now supports: - YAML and JSON configurations - Dataset-specific parameters - Overriding configuration via command line - Detailed logging and error handling	2025-02-27 17:39:14 +01:00
Andreas Köpf	bd745ae959	Merge pull request #237 from open-thought/rich/richmorevalfixes2 Fix graph color example template	2025-02-27 16:08:23 +01:00
Rich Jones	ca5372dcc1	rm typo	2025-02-27 13:44:33 +01:00

1 2 3 4 5 ...

1131 commits