reasoning-gym

mirror of https://github.com/open-thought/reasoning-gym.git synced 2026-04-28 17:29:39 +00:00

Author	SHA1	Message	Date
joesharratt1229	bd2788ad2a	updated test score board	2025-04-01 16:46:09 +00:00
joesharratt1229	b6b33a1d04	updated with thinking token	2025-04-01 16:29:19 +00:00
joesharratt1229	8d0e7db204	added preappend token	2025-04-01 16:28:04 +00:00
joesharratt1229	4b9c155cef	Merge remote-tracking branch 'origin/main' into feat/curr-adj	2025-04-01 16:17:31 +00:00
joesharratt1229	ba999153bf	updated configs	2025-04-01 16:13:15 +00:00
joesharratt1229	ec388ffc7c	updated read me	2025-04-01 16:12:56 +00:00
joesharratt1229	a957bab805	added algorithmic qwen 3b yaml	2025-04-01 16:12:32 +00:00
joesharratt1229	9aa17f64fa	added fsdp to hf utility	2025-04-01 16:12:00 +00:00
joesharratt1229	37bbd97191	updated datasets	2025-04-01 16:11:31 +00:00
Zafir Stojanovski	50846c3534	fix(env): ARC 1D curriculum (#402 ) * Add arc_1d curriculum * Add difficulty to metadata * use range attribute instead of scalar --------- Co-authored-by: abdulhakeem <abdulhakeemadefioye@gmail.com> Co-authored-by: Oliver Stanley <olivergestanley@gmail.com>	2025-04-01 13:01:15 +02:00
vncntt	cd85c2d632	add knights knaves curriculum (#401 ) * add knights knaves curriculum * add metadata + width constraints	2025-04-01 12:20:58 +02:00
Oliver Stanley	ea10a0f932	update task count in readme (#400 ) * update task count in readme * fix link	2025-04-01 10:51:36 +02:00
Zafir Stojanovski	8c45571a48	visualize heatmap sorted by overall performance (#397 )	2025-04-01 00:08:39 +02:00
joesharratt1229	9f9f816902	added updates	2025-03-29 08:07:57 +00:00
Zafir Stojanovski	c6663cdb81	fix(training): Prepend `<think>` token in format reward (#396 ) * prepend think token in format reward * pre commit + fix some default vals * add checkpoint config	2025-03-28 09:45:17 +01:00
joesharratt1229	774d23664d	added local evals	2025-03-28 05:00:30 +00:00
joesharratt1229	7368d6d313	updated configs	2025-03-28 00:05:58 +00:00
joesharratt1229	cc0bacd8e1	updated correctness score func	2025-03-26 19:54:37 +00:00
joesharratt1229	f50c7221ac	updated spell back	2025-03-26 19:53:27 +00:00
joesharratt1229	16919223be	Merge branch 'feat/curr-adj' of https://github.com/open-thought/reasoning-gym into feat/curr-adj	2025-03-26 19:52:56 +00:00
joesharratt1229	74eca6c45b	updated curr	2025-03-26 19:52:32 +00:00
joesharratt1229	c952f31a61	updated missing trainer func	2025-03-26 19:51:45 +00:00
joesharratt1229	765d0aa3fe	Add files via upload	2025-03-26 08:06:27 +00:00
joesharratt1229	d4ef7be4e9	Merge branch 'feat/curr-adj' of https://github.com/open-thought/reasoning-gym into feat/curr-adj	2025-03-25 16:49:25 +00:00
joesharratt1229	cd5bd7de5c	added	2025-03-25 08:29:21 +00:00
joesharratt1229	5e41e61058	added qwen2.5	2025-03-25 07:44:00 +00:00
joesharratt1229	d9075a2806	updated read me	2025-03-25 06:39:36 +00:00
joesharratt1229	73f7cc7a66	added spell backward	2025-03-25 06:17:54 +00:00
joesharratt1229	ed31964b06	updated yaml	2025-03-25 06:13:41 +00:00
joesharratt1229	fee4e37ae4	added composite changes	2025-03-25 05:38:07 +00:00
joesharratt1229	a8b1408967	added composite changes	2025-03-25 05:28:44 +00:00
joesharratt1229	4a37dbb5c1	changed config	2025-03-25 05:13:06 +00:00
joesharratt1229	b8a2ac6ba3	removed duplicated fit	2025-03-25 04:07:59 +00:00
joesharratt1229	06dab0e41d	added spell	2025-03-25 04:01:20 +00:00
joesharratt1229	47a2a7eab7	removed redundant argument	2025-03-25 03:54:32 +00:00
joesharratt1229	7f82aae67c	Merge remote-tracking branch 'origin/main' into feat/curr-adj	2025-03-25 03:50:41 +00:00
joesharratt1229	e368058ef1	Delete eval/eval/r1/algorithmic/word_sorting.json	2025-03-25 03:47:19 +00:00
joesharratt1229	9335b56252	corrected small errors	2025-03-25 03:45:18 +00:00
Tiago Serafim	7ae2942c34	Fix small bit of old code (#386 ) Removed the `sat_utils.` that was left there from the original zebra package. This allows the zebra generator to be used with the full set of available clues.	2025-03-24 09:00:28 +00:00
Zafir Stojanovski	2a970390e8	remove coach and update tests (#393 )	2025-03-24 00:10:01 +01:00
joesharratt1229	84f3fa731e	readapted readme	2025-03-23 20:30:16 +00:00
joesharratt1229	6fa76f11b5	added curriculum	2025-03-23 20:25:42 +00:00
Oliver Stanley	eb69916c1b	initial verl training codebase (#389 ) * fixes for latest verl * composite dataset training experiment * use stateful dataloaders to match verl changes * training readme * add formatting reward * length reward impl * standalone reasoning_gym config section * curriculum learning, new length reward, more config	2025-03-20 15:04:57 +00:00
Zafir Stojanovski	ce0a6c4878	fix(envs): Add source dataset and index to metadata (#388 ) * add source dataset and index to metadata * fix typo * fix coach class and its test	2025-03-20 11:12:14 +00:00
Oliver Stanley	7475a20700	include ranges rather than sampled values in difficulty metadata dicts (#387 ) * update difficulty metadata for logic datasets * update difficulty metadata for graph datasets * update difficulty metadata for geometry datasets * update difficulty metadata for games datasets * update difficulty metadata for cognition datasets * update difficulty metadata for arithmetic datasets * update difficulty metadata for arc datasets * update difficulty metadata for algorithmic datasets * update difficulty metadata for algebra datasets * use tuples * update tests * update tests	2025-03-20 10:27:03 +01:00
Rich Jones	b69c35818a	fix figlet font curr imports (#383 )	2025-03-18 23:38:17 +01:00
Zafir Stojanovski	814da6e08a	add difficulty metadata	2025-03-18 13:51:43 +01:00
Zafir Stojanovski	29bf78293f	figlet font curriculum	2025-03-18 13:51:43 +01:00
joesharratt1229	9234aa77bf	Feat/open instruct example (#381 ) * added open-instruct * fixed hooks * GRPO --------- Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>	2025-03-17 23:20:11 +01:00
Andreas Koepf	1511c5e301	don't pass answer value to eval	2025-03-17 23:13:53 +01:00

1 2 3 4 5 ...

1283 commits