joesharratt1229
|
8d0e7db204
|
added preappend token
|
2025-04-01 16:28:04 +00:00 |
|
joesharratt1229
|
ba999153bf
|
updated configs
|
2025-04-01 16:13:15 +00:00 |
|
joesharratt1229
|
a957bab805
|
added algorithmic qwen 3b yaml
|
2025-04-01 16:12:32 +00:00 |
|
joesharratt1229
|
9f9f816902
|
added updates
|
2025-03-29 08:07:57 +00:00 |
|
joesharratt1229
|
7368d6d313
|
updated configs
|
2025-03-28 00:05:58 +00:00 |
|
joesharratt1229
|
74eca6c45b
|
updated curr
|
2025-03-26 19:52:32 +00:00 |
|
joesharratt1229
|
d4ef7be4e9
|
Merge branch 'feat/curr-adj' of https://github.com/open-thought/reasoning-gym into feat/curr-adj
|
2025-03-25 16:49:25 +00:00 |
|
joesharratt1229
|
cd5bd7de5c
|
added
|
2025-03-25 08:29:21 +00:00 |
|
joesharratt1229
|
5e41e61058
|
added qwen2.5
|
2025-03-25 07:44:00 +00:00 |
|
joesharratt1229
|
73f7cc7a66
|
added spell backward
|
2025-03-25 06:17:54 +00:00 |
|
joesharratt1229
|
ed31964b06
|
updated yaml
|
2025-03-25 06:13:41 +00:00 |
|
joesharratt1229
|
fee4e37ae4
|
added composite changes
|
2025-03-25 05:38:07 +00:00 |
|
joesharratt1229
|
a8b1408967
|
added composite changes
|
2025-03-25 05:28:44 +00:00 |
|
joesharratt1229
|
4a37dbb5c1
|
changed config
|
2025-03-25 05:13:06 +00:00 |
|
joesharratt1229
|
9335b56252
|
corrected small errors
|
2025-03-25 03:45:18 +00:00 |
|
joesharratt1229
|
6fa76f11b5
|
added curriculum
|
2025-03-23 20:25:42 +00:00 |
|
Oliver Stanley
|
eb69916c1b
|
initial verl training codebase (#389)
* fixes for latest verl
* composite dataset training experiment
* use stateful dataloaders to match verl changes
* training readme
* add formatting reward
* length reward impl
* standalone reasoning_gym config section
* curriculum learning, new length reward, more config
|
2025-03-20 15:04:57 +00:00 |
|