Commit graph

17 commits

Author SHA1 Message Date
joesharratt1229
8d0e7db204 added preappend token 2025-04-01 16:28:04 +00:00
joesharratt1229
ba999153bf updated configs 2025-04-01 16:13:15 +00:00
joesharratt1229
a957bab805 added algorithmic qwen 3b yaml 2025-04-01 16:12:32 +00:00
joesharratt1229
9f9f816902 added updates 2025-03-29 08:07:57 +00:00
joesharratt1229
7368d6d313 updated configs 2025-03-28 00:05:58 +00:00
joesharratt1229
74eca6c45b updated curr 2025-03-26 19:52:32 +00:00
joesharratt1229
d4ef7be4e9 Merge branch 'feat/curr-adj' of https://github.com/open-thought/reasoning-gym into feat/curr-adj 2025-03-25 16:49:25 +00:00
joesharratt1229
cd5bd7de5c added 2025-03-25 08:29:21 +00:00
joesharratt1229
5e41e61058 added qwen2.5 2025-03-25 07:44:00 +00:00
joesharratt1229
73f7cc7a66 added spell backward 2025-03-25 06:17:54 +00:00
joesharratt1229
ed31964b06 updated yaml 2025-03-25 06:13:41 +00:00
joesharratt1229
fee4e37ae4 added composite changes 2025-03-25 05:38:07 +00:00
joesharratt1229
a8b1408967 added composite changes 2025-03-25 05:28:44 +00:00
joesharratt1229
4a37dbb5c1 changed config 2025-03-25 05:13:06 +00:00
joesharratt1229
9335b56252 corrected small errors 2025-03-25 03:45:18 +00:00
joesharratt1229
6fa76f11b5 added curriculum 2025-03-23 20:25:42 +00:00
Oliver Stanley
eb69916c1b
initial verl training codebase (#389)
* fixes for latest verl
* composite dataset training experiment
* use stateful dataloaders to match verl changes
* training readme
* add formatting reward
* length reward impl
* standalone reasoning_gym config section
* curriculum learning, new length reward, more config
2025-03-20 15:04:57 +00:00