reasoning-gym

mirror of https://github.com/open-thought/reasoning-gym.git synced 2026-04-19 12:58:07 +00:00

Author	SHA1	Message	Date
Ritvik Rastogi	49b07130b3	feat: add scoring cascade for reducing false negatives (#526 ) * feat: add scoring cascade for reducing false negatives in answer verification * style: fix black and isort formatting Run black and isort to satisfy pre-commit checks. Made-with: Cursor * docs: add scoring cascade example to Quickstart section Mention the experimental scoring cascade feature at the end of the Quickstart section with a disclaimer and complete usage examples showing both the dataset method and standalone function. Made-with: Cursor * docs: shorten scoring cascade section in README Trim to a concise standalone example per review feedback. Made-with: Cursor * docs: simplify scoring cascade description in README Made-with: Cursor * update readme --------- Co-authored-by: Zafir Stojanovski <zaf.stojano@gmail.com>	2026-04-17 21:39:15 +02:00
Andreas Köpf	437e0b49c4	bump version to v0.1.26.dev0 (#525 )	2026-03-28 14:55:43 +01:00
Oliver Stanley	21e6d2a9a5	add path-star task environment (#499 ) * draft path-star task * typos * fix for paper spec * rm teacherless mode * add imports * fixes * validation tweak * test tweak	2026-03-28 01:07:49 +01:00
Zafir Stojanovski	d26663fb3f	Fix impossible_ratio not being respected in knight_swap (#521 ) (#524 ) Move make_impossible decision before the retry loop so it's fixed per item instead of re-rolled on every attempt, which skewed the actual ratio of impossible puzzles well above the configured value.	2026-03-27 15:18:08 +00:00
Zafir Stojanovski	49b1dbbcce	Fix misleading instruction in shortest_path asking for "length" instead of path (#523 ) The prompt asked to "find the length of the shortest path" but the expected answer is a sequence of directions. This caused models to answer with a number instead of directions, degrading evaluation results. Closes #522 Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-25 13:02:23 +01:00
Zafir Stojanovski	9a91d92ca6	Update README.md with new project (#519 ) Add Apple's Multilingual Reasoning Gym paper to the projects list. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 17:13:56 +01:00
Zafir Stojanovski	51bbe8c62b	Update README.md with new project (#518 ) Add NVIDIA Nemotron 3 Super to the list of projects using Reasoning Gym. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-15 13:53:52 +01:00
theblackcat102	235b5629f7	Fix/cryptarithm multiple solutions (#517 ) * [fix] issue #516 of `cryptarithm` validation issue --------- Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>	2026-03-15 13:53:26 +01:00
Gjorgji Noveski	5dcca08309	Add assertion for maze constraints and limit _random_floor_cell attempts (#515 ) * Added assertion and infinite loop fix for maze environment * Fixed maze grid size validation formula * Removed assertion check due not working for all maze configurations	2026-01-16 09:56:39 +01:00
SII-Whereby	7d68a6cc70	Fix(reasoning_gym/games/countdown): Resolve SymPy parsing conflict for 10+ input numbers (#514 ) * Refactor expression generation and substitution logic Updated symbol naming and added safe replacement for expressions. * Add expr_str to return values in countdown.py Modified return statement to include the modified expression string. * Implement test for min_numbers exceeding 10 Add test for CountdownDataset with more than 10 numbers * Remove trailing-whitespace * Improve readability of CountdownDataset initialization Refactor CountdownDataset initialization for readability.	2025-12-15 11:05:38 +00:00
Ramiro R. C.	de2e89d21d	Codeio prompt fix (#513 ) * prompr fix to request more specific JSON responses * corrected gallery examples too	2025-11-13 11:48:20 +01:00
Zafir Stojanovski	2c4e45d9a9	Update spiral_matrix.py (#511 ) * Improve spiral matrix instructions with clearer movement description and hint --------- Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>	2025-10-06 13:02:32 +02:00
Zafir Stojanovski	bcc68c5fee	Update README.md with new project (#510 )	2025-10-02 17:43:32 +02:00
Zafir Stojanovski	15d7f027e4	add mila projects (#508 )	2025-09-29 14:37:13 +01:00
Andreas Köpf	dd3117bbaf	bump version to v0.1.25.dev0 (#509 )	2025-09-29 14:36:30 +01:00
Zafir Stojanovski	2f9eaee32a	fix: Register missing `coin_flip` (#507 ) * register missing coin_flip * lint	2025-09-15 14:23:30 +02:00
Zafir Stojanovski	3fcb8642c6	(README): add gensyn paper (#506 ) * gensyn paper	2025-09-11 17:11:04 +02:00
Kumar Anant	b0815043a2	Add probability dataset (initial: Coin Flip dataset + curriculum) (#505 )	2025-09-06 15:59:23 +01:00
Rich Jones	b399c658ca	Add OptimalThinkingBench to Projects Using RG (#503 ) * Add OptimalThinkingBench to Projects Using RG * Update README.md	2025-08-24 21:36:11 +02:00
Denini Gabriel	02b7fac863	fix encoding to be able to run on win (#502 )	2025-08-18 09:19:45 +01:00
Zafir Stojanovski	b8aa55704b	add discord link (#500 )	2025-08-05 09:57:46 +01:00
Zafir Stojanovski	86c4f8552f	add GEM to projects using RG (#498 )	2025-08-02 09:09:53 +01:00
Zafir Stojanovski	0e4582f83b	fix(evaluation): Add instructions for running on MMLU Pro (#497 ) * add instructions for mmlu pro, format instructions for math benchmarks * lint * remove `--fewshot_as_multiturn`	2025-08-01 16:27:56 +02:00
Zafir Stojanovski	a969d8ef05	feat(curriculum): Knights and Knaves configs (#488 ) * configs * reduce complexity of curriculum * update lower bound * add failure threshold * update last_k * update thresholds for success and failure * update curriculum file as well * update run name for noncurriculum * lint * dtype model eval * return binary scoring * set eval repeats to 3 * fix tests	2025-07-31 10:18:05 +02:00
Szymon Ożóg	cf99528dbe	Run categories in parallel (#492 )	2025-07-30 18:11:27 +01:00
Szymon Ożóg	b29093e2ee	Add option to increase timeout (#493 )	2025-07-28 06:26:09 +02:00
Zafir Stojanovski	0f5352e5cd	fix: Training README.md (#491 ) * Update README.md in `training` * add pip install for verl	2025-07-27 11:56:00 +02:00
joesharratt1229	4b60c32978	Curr exp (#487 ) * began curr exp * added holdout words * updated config * added context * updated base curriculum * updaed * updated curriculum * updated * updated * updated automatic flag * updated ray trainer * update	2025-07-25 20:38:47 +01:00
theblackcat102	2d19f13e0f	[fix #484 ] resolve basic_arithmetic fails when size is large (#485 ) * [fix] resolve basic_arithmetic fails when size is large by replacing zero divisor with 1	2025-07-07 09:46:23 +01:00
Zafir Stojanovski	bf451d5197	Update README.md (#483 )	2025-07-05 01:57:21 +02:00
joesharratt1229	1c98584f28	Feat/unsloth example (#482 ) * cleaned up examples * updated failing hooks * updated readme * corrected linting checks	2025-06-28 17:04:38 +01:00
Rich Jones	d9cd20c174	Update README.md (RLSwarm GenRL) (#480 )	2025-06-26 10:20:45 +01:00
Oliver Stanley	1c9ed2e0eb	better usage demo in readme (#477 ) * better usage demo in readme * example of non-default configs	2025-06-25 13:38:25 -07:00
joesharratt1229	876e0aa440	corrected countdown issue (#479 )	2025-06-25 13:37:04 -07:00
Zafir Stojanovski	c2ac6fae32	Update README.md (#475 )	2025-06-24 14:19:11 +01:00
Zafir Stojanovski	56ce2e79a7	tutorial(training): Add a minimal example with `trl` (#473 ) * v0 * 2 gpu setup * improve parsing from yaml * update yaml dataset example * remove restriction on flash attn * more comments * first version of the readme * pin torch * simplify requirements * just flash attn * use set env instead * simpler set env * readme * add wandb project to setup * update template * update model id * post init to capture the config and weight * extract metadata * update config * update dataset config * move env for wandb project * pre-commit * remove qwen-math from training * more instructions * unused import * remove trl old * warmup ratio * warmup ratio * change model id * change model_id * add info about CUDA_VISIBLE_DEVICES	2025-06-21 00:01:31 +02:00
Oliver Stanley	49f3821098	add minimal verifiers example (#472 )	2025-06-20 16:31:02 +01:00
Adefioye	9e79fc84b6	fix: Rounding issues in score_answer and add unit tests (#462 )	2025-06-09 19:18:11 +01:00
joesharratt1229	51c2afc1fc	Fix/verl example (#465 ) * updated verl ex * updated script * removed curriculum verl and updated * updatied linting errors * renamed * updated config	2025-06-09 09:53:43 +01:00
Oliver Stanley	5726034a26	fix color_cubes answer strings, update gallery with latest envs (#464 ) * update gallery with latest envs * fix regression where answer str is wrong in color_cubes * re-update gallery	2025-06-08 13:16:54 +02:00
Oliver Stanley	602e4be0a2	add survo env (#461 ) * add survo env * add survo curriculum * add survo tests	2025-06-08 11:56:33 +01:00
Zafir Stojanovski	0159b1b571	Update README.md - Star History (#463 )	2025-06-08 11:51:43 +01:00
Oliver Stanley	c2fdb11980	add kakurasu env (#460 ) * add kakurasu env * add kakurasu curriculum * add kakurasu tests	2025-06-08 09:20:53 +01:00
Andreas Köpf	be2babea9c	Use raw URLs for images in README.md (#459 ) On pypi images were not correctly rendered because the old img src urls in README.md pointed to files on github with UI.	2025-06-06 21:23:59 +01:00
Oliver Stanley	1232a7d1e5	simplify training setup instructions (#454 ) * simplify training setup instructions * tweaks * update cfgs * readme update * readme update	2025-06-06 09:51:29 +01:00
Zafir Stojanovski	0ebabf709b	Update README.md with Atropos (#458 )	2025-06-06 09:24:25 +01:00
Zafir Stojanovski	0699e2f507	Update README.md (#451 )	2025-06-04 12:45:23 +02:00
Oliver Stanley	1a727ecf4e	support python 3.10 (#450 ) * support python 3.10 * add 3.10 to tests * new StrEnum	2025-06-04 10:34:01 +01:00
Zafir Stojanovski	84958baa69	abs path for images (#449 ) * abs path for images * width and height outside of style	2025-06-04 10:33:13 +02:00
Oliver Stanley	2a57a95ca2	add minimal example for building training datasets (#448 )	2025-06-03 19:28:45 +01:00

1 2 3 4 5 ...

1336 commits