reasoning-gym

mirror of https://github.com/open-thought/reasoning-gym.git synced 2026-04-19 12:58:07 +00:00

Author	SHA1	Message	Date
Zafir Stojanovski	0e4582f83b	fix(evaluation): Add instructions for running on MMLU Pro (#497 ) * add instructions for mmlu pro, format instructions for math benchmarks * lint * remove `--fewshot_as_multiturn`	2025-08-01 16:27:56 +02:00
Zafir Stojanovski	0f5352e5cd	fix: Training README.md (#491 ) * Update README.md in `training` * add pip install for verl	2025-07-27 11:56:00 +02:00
Zafir Stojanovski	56ce2e79a7	tutorial(training): Add a minimal example with `trl` (#473 ) * v0 * 2 gpu setup * improve parsing from yaml * update yaml dataset example * remove restriction on flash attn * more comments * first version of the readme * pin torch * simplify requirements * just flash attn * use set env instead * simpler set env * readme * add wandb project to setup * update template * update model id * post init to capture the config and weight * extract metadata * update config * update dataset config * move env for wandb project * pre-commit * remove qwen-math from training * more instructions * unused import * remove trl old * warmup ratio * warmup ratio * change model id * change model_id * add info about CUDA_VISIBLE_DEVICES	2025-06-21 00:01:31 +02:00
Oliver Stanley	1232a7d1e5	simplify training setup instructions (#454 ) * simplify training setup instructions * tweaks * update cfgs * readme update * readme update	2025-06-06 09:51:29 +01:00
Oliver Stanley	add527ada1	update training dir with external eval details (#437 ) * added games * added llama 3b training conf * update readme with details of external evals * readme update --------- Co-authored-by: joesharratt1229 <joesharratt1229@gmail.com>	2025-05-19 00:35:41 +02:00
Oliver Stanley	10863ea12b	inter-domain generalisation evaluation configs (#424 ) * add inter-domain generalisation eval config for algebra * add algorithmic eval cfg * vllm infer * add arithmetic eval cfg * add geometry eval cfg * add arc cfg * add games eval cfg * add cognition eval cfg * add graphs eval cfg	2025-04-22 17:32:35 +01:00
joesharratt1229	d0ef136d5b	Feat/intragen experiments (#414 ) * added curriculum * readapted readme * corrected small errors * Delete eval/eval/r1/algorithmic/word_sorting.json * removed redundant argument * added spell * removed duplicated fit * changed config * added composite changes * added composite changes * updated yaml * added spell backward * updated read me * added qwen2.5 * added * Add files via upload * updated missing trainer func * updated curr * updated spell back * updated correctness score func * updated configs * added local evals * added updates * updated datasets * added fsdp to hf utility * added algorithmic qwen 3b yaml * updated read me * updated configs * added preappend token * updated with thinking token * updated test score board * resolved comments * added evaluation scripts * removed results from pr * added config * added partial reward scoring * added evaluation composites * added training configs * added games eval * added rubriks cube * resolved merge cinflicts * added games config * added latest eval configs * updated strucutre * Delete training/evaluations/eval_graphs_composite.yaml --------- Co-authored-by: joesharratt1229 <joesharrat1229@gmail.com>	2025-04-16 08:04:52 +02:00
joesharratt1229	43c739cb3e	Feat/curr adj (#394 )	2025-04-02 06:39:14 +01:00
Oliver Stanley	eb69916c1b	initial verl training codebase (#389 ) * fixes for latest verl * composite dataset training experiment * use stateful dataloaders to match verl changes * training readme * add formatting reward * length reward impl * standalone reasoning_gym config section * curriculum learning, new length reward, more config	2025-03-20 15:04:57 +00:00

9 commits