reasoning-gym

mirror of https://github.com/open-thought/reasoning-gym.git synced 2026-04-29 17:35:16 +00:00

Author	SHA1	Message	Date
joesharratt1229	8d0e7db204	added preappend token	2025-04-01 16:28:04 +00:00
joesharratt1229	ba999153bf	updated configs	2025-04-01 16:13:15 +00:00
joesharratt1229	a957bab805	added algorithmic qwen 3b yaml	2025-04-01 16:12:32 +00:00
joesharratt1229	9f9f816902	added updates	2025-03-29 08:07:57 +00:00
joesharratt1229	7368d6d313	updated configs	2025-03-28 00:05:58 +00:00
joesharratt1229	74eca6c45b	updated curr	2025-03-26 19:52:32 +00:00
joesharratt1229	d4ef7be4e9	Merge branch 'feat/curr-adj' of https://github.com/open-thought/reasoning-gym into feat/curr-adj	2025-03-25 16:49:25 +00:00
joesharratt1229	cd5bd7de5c	added	2025-03-25 08:29:21 +00:00
joesharratt1229	5e41e61058	added qwen2.5	2025-03-25 07:44:00 +00:00
joesharratt1229	73f7cc7a66	added spell backward	2025-03-25 06:17:54 +00:00
joesharratt1229	ed31964b06	updated yaml	2025-03-25 06:13:41 +00:00
joesharratt1229	fee4e37ae4	added composite changes	2025-03-25 05:38:07 +00:00
joesharratt1229	a8b1408967	added composite changes	2025-03-25 05:28:44 +00:00
joesharratt1229	4a37dbb5c1	changed config	2025-03-25 05:13:06 +00:00
joesharratt1229	9335b56252	corrected small errors	2025-03-25 03:45:18 +00:00
joesharratt1229	6fa76f11b5	added curriculum	2025-03-23 20:25:42 +00:00
Oliver Stanley	eb69916c1b	initial verl training codebase (#389 ) * fixes for latest verl * composite dataset training experiment * use stateful dataloaders to match verl changes * training readme * add formatting reward * length reward impl * standalone reasoning_gym config section * curriculum learning, new length reward, more config	2025-03-20 15:04:57 +00:00

17 commits