Zafir Stojanovski
|
a969d8ef05
|
feat(curriculum): Knights and Knaves configs (#488)
* configs
* reduce complexity of curriculum
* update lower bound
* add failure threshold
* update last_k
* update thresholds for success and failure
* update curriculum file as well
* update run name for noncurriculum
* lint
* dtype model eval
* return binary scoring
* set eval repeats to 3
* fix tests
|
2025-07-31 10:18:05 +02:00 |
|
joesharratt1229
|
4b60c32978
|
Curr exp (#487)
* began curr exp
* added holdout words
* updated config
* added context
* updated base curriculum
* updaed
* updated curriculum
* updated
* updated
* updated automatic flag
* updated ray trainer
* update
|
2025-07-25 20:38:47 +01:00 |
|
Oliver Stanley
|
224532f12a
|
first inter-domain generalisation experiments (#412)
* tweak len reward
* first inter-generalisation experiment config
* update inter algorithmic config
* default to empty config
* fix typo
* change config to match experiment script
* long prompt fixes
* algorithmic training config tweaks
* imports
* update algorithmic training cfgs
* first logic composite config
* fix dset name
* tweaks
* fix syllogisms dataset
* rm temp print
* initial algebra config
* algebra cfg tweaks
* add gc
* add initial games cfg
* rename games cfg
* fix dset name
* fix sokoban metadata
* remove boxnet
* games cfg tweak
|
2025-04-14 21:06:40 +01:00 |
|
joesharratt1229
|
43c739cb3e
|
Feat/curr adj (#394)
|
2025-04-02 06:39:14 +01:00 |
|
Zafir Stojanovski
|
c6663cdb81
|
fix(training): Prepend <think> token in format reward (#396)
* prepend think token in format reward
* pre commit + fix some default vals
* add checkpoint config
|
2025-03-28 09:45:17 +01:00 |
|
Oliver Stanley
|
eb69916c1b
|
initial verl training codebase (#389)
* fixes for latest verl
* composite dataset training experiment
* use stateful dataloaders to match verl changes
* training readme
* add formatting reward
* length reward impl
* standalone reasoning_gym config section
* curriculum learning, new length reward, more config
|
2025-03-20 15:04:57 +00:00 |
|