joesharratt1229
|
ba999153bf
|
updated configs
|
2025-04-01 16:13:15 +00:00 |
|
joesharratt1229
|
ec388ffc7c
|
updated read me
|
2025-04-01 16:12:56 +00:00 |
|
joesharratt1229
|
a957bab805
|
added algorithmic qwen 3b yaml
|
2025-04-01 16:12:32 +00:00 |
|
joesharratt1229
|
9aa17f64fa
|
added fsdp to hf utility
|
2025-04-01 16:12:00 +00:00 |
|
joesharratt1229
|
9f9f816902
|
added updates
|
2025-03-29 08:07:57 +00:00 |
|
joesharratt1229
|
7368d6d313
|
updated configs
|
2025-03-28 00:05:58 +00:00 |
|
joesharratt1229
|
cc0bacd8e1
|
updated correctness score func
|
2025-03-26 19:54:37 +00:00 |
|
joesharratt1229
|
74eca6c45b
|
updated curr
|
2025-03-26 19:52:32 +00:00 |
|
joesharratt1229
|
c952f31a61
|
updated missing trainer func
|
2025-03-26 19:51:45 +00:00 |
|
joesharratt1229
|
d4ef7be4e9
|
Merge branch 'feat/curr-adj' of https://github.com/open-thought/reasoning-gym into feat/curr-adj
|
2025-03-25 16:49:25 +00:00 |
|
joesharratt1229
|
cd5bd7de5c
|
added
|
2025-03-25 08:29:21 +00:00 |
|
joesharratt1229
|
5e41e61058
|
added qwen2.5
|
2025-03-25 07:44:00 +00:00 |
|
joesharratt1229
|
d9075a2806
|
updated read me
|
2025-03-25 06:39:36 +00:00 |
|
joesharratt1229
|
73f7cc7a66
|
added spell backward
|
2025-03-25 06:17:54 +00:00 |
|
joesharratt1229
|
ed31964b06
|
updated yaml
|
2025-03-25 06:13:41 +00:00 |
|
joesharratt1229
|
fee4e37ae4
|
added composite changes
|
2025-03-25 05:38:07 +00:00 |
|
joesharratt1229
|
a8b1408967
|
added composite changes
|
2025-03-25 05:28:44 +00:00 |
|
joesharratt1229
|
4a37dbb5c1
|
changed config
|
2025-03-25 05:13:06 +00:00 |
|
joesharratt1229
|
b8a2ac6ba3
|
removed duplicated fit
|
2025-03-25 04:07:59 +00:00 |
|
joesharratt1229
|
47a2a7eab7
|
removed redundant argument
|
2025-03-25 03:54:32 +00:00 |
|
joesharratt1229
|
9335b56252
|
corrected small errors
|
2025-03-25 03:45:18 +00:00 |
|
joesharratt1229
|
84f3fa731e
|
readapted readme
|
2025-03-23 20:30:16 +00:00 |
|
joesharratt1229
|
6fa76f11b5
|
added curriculum
|
2025-03-23 20:25:42 +00:00 |
|
Oliver Stanley
|
eb69916c1b
|
initial verl training codebase (#389)
* fixes for latest verl
* composite dataset training experiment
* use stateful dataloaders to match verl changes
* training readme
* add formatting reward
* length reward impl
* standalone reasoning_gym config section
* curriculum learning, new length reward, more config
|
2025-03-20 15:04:57 +00:00 |
|