Commit graph

20 commits

Author SHA1 Message Date
Oliver
d4e19056ea cfg 2025-04-28 21:27:00 +01:00
Oliver
55c9810113 training updates 2025-04-28 20:56:23 +01:00
Oliver
e58f3c1a35 cfg 2025-04-24 21:11:16 +01:00
Oliver
37b88d194b add loss_agg_mode 2025-04-24 20:46:37 +01:00
Oliver
1ee3b0bbb8 add use kl param 2025-04-24 20:42:57 +01:00
Oliver
e39b6b5f27 cfg change 2025-04-24 20:16:52 +01:00
Oliver
68ef3fa249 rm original cfg 2025-04-24 19:38:24 +01:00
Oliver
830ac3e10a impl conditional reward 2025-04-24 19:36:30 +01:00
Oliver
450d3dcfa4 cfg tweak 2025-04-24 17:29:51 +01:00
Oliver
897e618bfa cfg 2025-04-22 21:15:08 +01:00
Oliver
1343bcf63e cfg 2025-04-22 20:34:37 +01:00
Oliver
1ccd62bc1a cfg 2025-04-22 20:33:04 +01:00
Oliver
e372224ee1 cfgs 2025-04-22 20:32:17 +01:00
Oliver
4aeffa8182 cfgs 2025-04-22 20:30:13 +01:00
Oliver
4c2f83da5f add math config 2025-04-22 20:20:29 +01:00
joesharratt1229
d0ef136d5b
Feat/intragen experiments (#414)
* added curriculum

* readapted readme

* corrected small errors

* Delete eval/eval/r1/algorithmic/word_sorting.json

* removed redundant argument

* added spell

* removed duplicated fit

* changed config

* added composite changes

* added composite changes

* updated yaml

* added spell backward

* updated read me

* added qwen2.5

* added

* Add files via upload

* updated missing trainer func

* updated curr

* updated spell back

* updated correctness score func

* updated configs

* added local evals

* added updates

* updated datasets

* added fsdp to hf utility

* added algorithmic qwen 3b yaml

* updated read me

* updated configs

* added preappend token

* updated with thinking token

* updated test score board

* resolved comments

* added evaluation scripts

* removed results from pr

* added config

* added partial reward scoring

* added evaluation composites

* added training configs

* added games eval

* added rubriks cube

* resolved merge cinflicts

* added games config

* added latest eval configs

* updated strucutre

* Delete training/evaluations/eval_graphs_composite.yaml

---------

Co-authored-by: joesharratt1229 <joesharrat1229@gmail.com>
2025-04-16 08:04:52 +02:00
Oliver Stanley
224532f12a
first inter-domain generalisation experiments (#412)
* tweak len reward

* first inter-generalisation experiment config

* update inter algorithmic config

* default to empty config

* fix typo

* change config to match experiment script

* long prompt fixes

* algorithmic training config tweaks

* imports

* update algorithmic training cfgs

* first logic composite config

* fix dset name

* tweaks

* fix syllogisms dataset

* rm temp print

* initial algebra config

* algebra cfg tweaks

* add gc

* add initial games cfg

* rename games cfg

* fix dset name

* fix sokoban metadata

* remove boxnet

* games cfg tweak
2025-04-14 21:06:40 +01:00
joesharratt1229
43c739cb3e
Feat/curr adj (#394) 2025-04-02 06:39:14 +01:00
Zafir Stojanovski
c6663cdb81
fix(training): Prepend <think> token in format reward (#396)
* prepend think token in format reward

* pre commit + fix some default vals

* add checkpoint config
2025-03-28 09:45:17 +01:00
Oliver Stanley
eb69916c1b
initial verl training codebase (#389)
* fixes for latest verl
* composite dataset training experiment
* use stateful dataloaders to match verl changes
* training readme
* add formatting reward
* length reward impl
* standalone reasoning_gym config section
* curriculum learning, new length reward, more config
2025-03-20 15:04:57 +00:00