Oliver
|
f57b5adcb0
|
cfg updates
|
2025-04-28 22:08:26 +01:00 |
|
Oliver
|
830ac3e10a
|
impl conditional reward
|
2025-04-24 19:36:30 +01:00 |
|
Oliver Stanley
|
224532f12a
|
first inter-domain generalisation experiments (#412)
* tweak len reward
* first inter-generalisation experiment config
* update inter algorithmic config
* default to empty config
* fix typo
* change config to match experiment script
* long prompt fixes
* algorithmic training config tweaks
* imports
* update algorithmic training cfgs
* first logic composite config
* fix dset name
* tweaks
* fix syllogisms dataset
* rm temp print
* initial algebra config
* algebra cfg tweaks
* add gc
* add initial games cfg
* rename games cfg
* fix dset name
* fix sokoban metadata
* remove boxnet
* games cfg tweak
|
2025-04-14 21:06:40 +01:00 |
|
joesharratt1229
|
43c739cb3e
|
Feat/curr adj (#394)
|
2025-04-02 06:39:14 +01:00 |
|
Zafir Stojanovski
|
c6663cdb81
|
fix(training): Prepend <think> token in format reward (#396)
* prepend think token in format reward
* pre commit + fix some default vals
* add checkpoint config
|
2025-03-28 09:45:17 +01:00 |
|
Oliver Stanley
|
eb69916c1b
|
initial verl training codebase (#389)
* fixes for latest verl
* composite dataset training experiment
* use stateful dataloaders to match verl changes
* training readme
* add formatting reward
* length reward impl
* standalone reasoning_gym config section
* curriculum learning, new length reward, more config
|
2025-03-20 15:04:57 +00:00 |
|