reasoning-gym

mirror of https://github.com/open-thought/reasoning-gym.git synced 2026-04-19 12:58:07 +00:00

Author	SHA1	Message	Date
Zafir Stojanovski	15d7f027e4	add mila projects (#508 )	2025-09-29 14:37:13 +01:00
Andreas Köpf	dd3117bbaf	bump version to v0.1.25.dev0 (#509 )	2025-09-29 14:36:30 +01:00
Zafir Stojanovski	2f9eaee32a	fix: Register missing `coin_flip` (#507 ) * register missing coin_flip * lint	2025-09-15 14:23:30 +02:00
Zafir Stojanovski	3fcb8642c6	(README): add gensyn paper (#506 ) * gensyn paper	2025-09-11 17:11:04 +02:00
Kumar Anant	b0815043a2	Add probability dataset (initial: Coin Flip dataset + curriculum) (#505 )	2025-09-06 15:59:23 +01:00
Rich Jones	b399c658ca	Add OptimalThinkingBench to Projects Using RG (#503 ) * Add OptimalThinkingBench to Projects Using RG * Update README.md	2025-08-24 21:36:11 +02:00
Denini Gabriel	02b7fac863	fix encoding to be able to run on win (#502 )	2025-08-18 09:19:45 +01:00
Zafir Stojanovski	b8aa55704b	add discord link (#500 )	2025-08-05 09:57:46 +01:00
Zafir Stojanovski	86c4f8552f	add GEM to projects using RG (#498 )	2025-08-02 09:09:53 +01:00
Zafir Stojanovski	0e4582f83b	fix(evaluation): Add instructions for running on MMLU Pro (#497 ) * add instructions for mmlu pro, format instructions for math benchmarks * lint * remove `--fewshot_as_multiturn`	2025-08-01 16:27:56 +02:00
Zafir Stojanovski	a969d8ef05	feat(curriculum): Knights and Knaves configs (#488 ) * configs * reduce complexity of curriculum * update lower bound * add failure threshold * update last_k * update thresholds for success and failure * update curriculum file as well * update run name for noncurriculum * lint * dtype model eval * return binary scoring * set eval repeats to 3 * fix tests	2025-07-31 10:18:05 +02:00
Szymon Ożóg	cf99528dbe	Run categories in parallel (#492 )	2025-07-30 18:11:27 +01:00
Szymon Ożóg	b29093e2ee	Add option to increase timeout (#493 )	2025-07-28 06:26:09 +02:00
Zafir Stojanovski	0f5352e5cd	fix: Training README.md (#491 ) * Update README.md in `training` * add pip install for verl	2025-07-27 11:56:00 +02:00
joesharratt1229	4b60c32978	Curr exp (#487 ) * began curr exp * added holdout words * updated config * added context * updated base curriculum * updaed * updated curriculum * updated * updated * updated automatic flag * updated ray trainer * update	2025-07-25 20:38:47 +01:00
theblackcat102	2d19f13e0f	[fix #484 ] resolve basic_arithmetic fails when size is large (#485 ) * [fix] resolve basic_arithmetic fails when size is large by replacing zero divisor with 1	2025-07-07 09:46:23 +01:00
Zafir Stojanovski	bf451d5197	Update README.md (#483 )	2025-07-05 01:57:21 +02:00
joesharratt1229	1c98584f28	Feat/unsloth example (#482 ) * cleaned up examples * updated failing hooks * updated readme * corrected linting checks	2025-06-28 17:04:38 +01:00
Rich Jones	d9cd20c174	Update README.md (RLSwarm GenRL) (#480 )	2025-06-26 10:20:45 +01:00
Oliver Stanley	1c9ed2e0eb	better usage demo in readme (#477 ) * better usage demo in readme * example of non-default configs	2025-06-25 13:38:25 -07:00
joesharratt1229	876e0aa440	corrected countdown issue (#479 )	2025-06-25 13:37:04 -07:00
Zafir Stojanovski	c2ac6fae32	Update README.md (#475 )	2025-06-24 14:19:11 +01:00
Zafir Stojanovski	56ce2e79a7	tutorial(training): Add a minimal example with `trl` (#473 ) * v0 * 2 gpu setup * improve parsing from yaml * update yaml dataset example * remove restriction on flash attn * more comments * first version of the readme * pin torch * simplify requirements * just flash attn * use set env instead * simpler set env * readme * add wandb project to setup * update template * update model id * post init to capture the config and weight * extract metadata * update config * update dataset config * move env for wandb project * pre-commit * remove qwen-math from training * more instructions * unused import * remove trl old * warmup ratio * warmup ratio * change model id * change model_id * add info about CUDA_VISIBLE_DEVICES	2025-06-21 00:01:31 +02:00
Oliver Stanley	49f3821098	add minimal verifiers example (#472 )	2025-06-20 16:31:02 +01:00
Adefioye	9e79fc84b6	fix: Rounding issues in score_answer and add unit tests (#462 )	2025-06-09 19:18:11 +01:00
joesharratt1229	51c2afc1fc	Fix/verl example (#465 ) * updated verl ex * updated script * removed curriculum verl and updated * updatied linting errors * renamed * updated config	2025-06-09 09:53:43 +01:00
Oliver Stanley	5726034a26	fix color_cubes answer strings, update gallery with latest envs (#464 ) * update gallery with latest envs * fix regression where answer str is wrong in color_cubes * re-update gallery	2025-06-08 13:16:54 +02:00
Oliver Stanley	602e4be0a2	add survo env (#461 ) * add survo env * add survo curriculum * add survo tests	2025-06-08 11:56:33 +01:00
Zafir Stojanovski	0159b1b571	Update README.md - Star History (#463 )	2025-06-08 11:51:43 +01:00
Oliver Stanley	c2fdb11980	add kakurasu env (#460 ) * add kakurasu env * add kakurasu curriculum * add kakurasu tests	2025-06-08 09:20:53 +01:00
Andreas Köpf	be2babea9c	Use raw URLs for images in README.md (#459 ) On pypi images were not correctly rendered because the old img src urls in README.md pointed to files on github with UI.	2025-06-06 21:23:59 +01:00
Oliver Stanley	1232a7d1e5	simplify training setup instructions (#454 ) * simplify training setup instructions * tweaks * update cfgs * readme update * readme update	2025-06-06 09:51:29 +01:00
Zafir Stojanovski	0ebabf709b	Update README.md with Atropos (#458 )	2025-06-06 09:24:25 +01:00
Zafir Stojanovski	0699e2f507	Update README.md (#451 )	2025-06-04 12:45:23 +02:00
Oliver Stanley	1a727ecf4e	support python 3.10 (#450 ) * support python 3.10 * add 3.10 to tests * new StrEnum	2025-06-04 10:34:01 +01:00
Zafir Stojanovski	84958baa69	abs path for images (#449 ) * abs path for images * width and height outside of style	2025-06-04 10:33:13 +02:00
Oliver Stanley	2a57a95ca2	add minimal example for building training datasets (#448 )	2025-06-03 19:28:45 +01:00
Zafir Stojanovski	b3f81a6609	fix(README): Arxiv link (#447 )	2025-06-02 12:20:38 +02:00
Zafir Stojanovski	17a8431013	rename to easy and hard (#445 )	2025-06-02 10:34:05 +02:00
Zafir Stojanovski	af2548f8f2	Add README assets (#446 ) * add assets * pre-commit * remove bg	2025-06-02 10:33:54 +02:00
Adefioye	9053009dbe	Fix bug in normalize_answer method (#444 )	2025-06-02 08:58:54 +02:00
Oliver Stanley	c0e98f93b4	make task entries json serializable (#443 ) * make sympy-based task entries json serializable * remove datetime objs from time_intervals metadata * make adv geometry json serializable * make futoshiki metadata json serializable * fixes * futoshiki tweaks * fix adv geometry * deal with fractions in str representations * fix * restore start_time, end_time as str	2025-06-02 08:57:15 +02:00
Zafir Stojanovski	6614338ecc	add numbers to performance heatmap (#442 )	2025-05-30 18:39:13 +02:00
Zafir Stojanovski	b843f33b1d	fix(eval): comparison plot (#441 ) * heatmap * filter comparison plots * latex style * curriculum heatmap * pre-commit * update figsize * large y-ticks * larger font * thinner * include 50	2025-05-29 12:31:07 +02:00
Oliver Stanley	f51769927e	add .gitattributes for correct repo classification (#439 )	2025-05-19 21:41:07 +03:00
Zafir Stojanovski	93e731c29c	heatmap (#438 )	2025-05-19 09:07:45 +01:00
Oliver Stanley	add527ada1	update training dir with external eval details (#437 ) * added games * added llama 3b training conf * update readme with details of external evals * readme update --------- Co-authored-by: joesharratt1229 <joesharratt1229@gmail.com>	2025-05-19 00:35:41 +02:00
Zafir Stojanovski	5961a10145	comparison plot (#436 )	2025-05-18 22:57:49 +01:00
Zafir Stojanovski	0cda6b1205	qwen math training code (#435 ) * qwen math training code * pre-commit	2025-05-16 13:19:19 +02:00
Oliver Stanley	47303211b3	update gallery (#434 )	2025-05-15 22:41:43 +02:00

1 2 3 4 5 ...

1323 commits