joesharratt1229
|
d9075a2806
|
updated read me
|
2025-03-25 06:39:36 +00:00 |
|
joesharratt1229
|
ed31964b06
|
updated yaml
|
2025-03-25 06:13:41 +00:00 |
|
joesharratt1229
|
fee4e37ae4
|
added composite changes
|
2025-03-25 05:38:07 +00:00 |
|
joesharratt1229
|
a8b1408967
|
added composite changes
|
2025-03-25 05:28:44 +00:00 |
|
joesharratt1229
|
4a37dbb5c1
|
changed config
|
2025-03-25 05:13:06 +00:00 |
|
joesharratt1229
|
b8a2ac6ba3
|
removed duplicated fit
|
2025-03-25 04:07:59 +00:00 |
|
joesharratt1229
|
47a2a7eab7
|
removed redundant argument
|
2025-03-25 03:54:32 +00:00 |
|
joesharratt1229
|
9335b56252
|
corrected small errors
|
2025-03-25 03:45:18 +00:00 |
|
joesharratt1229
|
84f3fa731e
|
readapted readme
|
2025-03-23 20:30:16 +00:00 |
|
joesharratt1229
|
6fa76f11b5
|
added curriculum
|
2025-03-23 20:25:42 +00:00 |
|
Oliver Stanley
|
eb69916c1b
|
initial verl training codebase (#389)
* fixes for latest verl
* composite dataset training experiment
* use stateful dataloaders to match verl changes
* training readme
* add formatting reward
* length reward impl
* standalone reasoning_gym config section
* curriculum learning, new length reward, more config
|
2025-03-20 15:04:57 +00:00 |
|