Commit graph

9 commits

Author SHA1 Message Date
Zafir Stojanovski
0e4582f83b
fix(evaluation): Add instructions for running on MMLU Pro (#497)
* add instructions for mmlu pro, format instructions for math benchmarks

* lint

* remove `--fewshot_as_multiturn`
2025-08-01 16:27:56 +02:00
Zafir Stojanovski
0f5352e5cd
fix: Training README.md (#491)
* Update README.md in `training`

* add pip install for verl
2025-07-27 11:56:00 +02:00
Zafir Stojanovski
56ce2e79a7
tutorial(training): Add a minimal example with trl (#473)
* v0

* 2 gpu setup

* improve parsing from yaml

* update yaml dataset example

* remove restriction on flash attn

* more comments

* first version of the readme

* pin torch

* simplify requirements

* just flash attn

* use set env instead

* simpler set env

* readme

* add wandb project to setup

* update template

* update model id

* post init to capture the config and weight

* extract metadata

* update config

* update dataset config

* move env for wandb project

* pre-commit

* remove qwen-math from training

* more instructions

* unused import

* remove trl old

* warmup ratio

* warmup ratio

* change model id

* change model_id

* add info about CUDA_VISIBLE_DEVICES
2025-06-21 00:01:31 +02:00
Oliver Stanley
1232a7d1e5
simplify training setup instructions (#454)
* simplify training setup instructions

* tweaks

* update cfgs

* readme update

* readme update
2025-06-06 09:51:29 +01:00
Oliver Stanley
add527ada1
update training dir with external eval details (#437)
* added games

* added llama 3b training conf

* update readme with details of external evals

* readme update

---------

Co-authored-by: joesharratt1229 <joesharratt1229@gmail.com>
2025-05-19 00:35:41 +02:00
Oliver Stanley
10863ea12b
inter-domain generalisation evaluation configs (#424)
* add inter-domain generalisation eval config for algebra

* add algorithmic eval cfg

* vllm infer

* add arithmetic eval cfg

* add geometry eval cfg

* add arc cfg

* add games eval cfg

* add cognition eval cfg

* add graphs eval cfg
2025-04-22 17:32:35 +01:00
joesharratt1229
d0ef136d5b
Feat/intragen experiments (#414)
* added curriculum

* readapted readme

* corrected small errors

* Delete eval/eval/r1/algorithmic/word_sorting.json

* removed redundant argument

* added spell

* removed duplicated fit

* changed config

* added composite changes

* added composite changes

* updated yaml

* added spell backward

* updated read me

* added qwen2.5

* added

* Add files via upload

* updated missing trainer func

* updated curr

* updated spell back

* updated correctness score func

* updated configs

* added local evals

* added updates

* updated datasets

* added fsdp to hf utility

* added algorithmic qwen 3b yaml

* updated read me

* updated configs

* added preappend token

* updated with thinking token

* updated test score board

* resolved comments

* added evaluation scripts

* removed results from pr

* added config

* added partial reward scoring

* added evaluation composites

* added training configs

* added games eval

* added rubriks cube

* resolved merge cinflicts

* added games config

* added latest eval configs

* updated strucutre

* Delete training/evaluations/eval_graphs_composite.yaml

---------

Co-authored-by: joesharratt1229 <joesharrat1229@gmail.com>
2025-04-16 08:04:52 +02:00
joesharratt1229
43c739cb3e
Feat/curr adj (#394) 2025-04-02 06:39:14 +01:00
Oliver Stanley
eb69916c1b
initial verl training codebase (#389)
* fixes for latest verl
* composite dataset training experiment
* use stateful dataloaders to match verl changes
* training readme
* add formatting reward
* length reward impl
* standalone reasoning_gym config section
* curriculum learning, new length reward, more config
2025-03-20 15:04:57 +00:00