Commit graph

9 commits

Author SHA1 Message Date
joesharratt1229
fee4e37ae4 added composite changes 2025-03-25 05:38:07 +00:00
joesharratt1229
a8b1408967 added composite changes 2025-03-25 05:28:44 +00:00
joesharratt1229
4a37dbb5c1 changed config 2025-03-25 05:13:06 +00:00
joesharratt1229
b8a2ac6ba3 removed duplicated fit 2025-03-25 04:07:59 +00:00
joesharratt1229
47a2a7eab7 removed redundant argument 2025-03-25 03:54:32 +00:00
joesharratt1229
9335b56252 corrected small errors 2025-03-25 03:45:18 +00:00
joesharratt1229
84f3fa731e readapted readme 2025-03-23 20:30:16 +00:00
joesharratt1229
6fa76f11b5 added curriculum 2025-03-23 20:25:42 +00:00
Oliver Stanley
eb69916c1b
initial verl training codebase (#389)
* fixes for latest verl
* composite dataset training experiment
* use stateful dataloaders to match verl changes
* training readme
* add formatting reward
* length reward impl
* standalone reasoning_gym config section
* curriculum learning, new length reward, more config
2025-03-20 15:04:57 +00:00