mirror of
https://github.com/open-thought/reasoning-gym.git
synced 2026-04-27 17:23:19 +00:00
tutorial(training): Add a minimal example with trl (#473)
* v0 * 2 gpu setup * improve parsing from yaml * update yaml dataset example * remove restriction on flash attn * more comments * first version of the readme * pin torch * simplify requirements * just flash attn * use set env instead * simpler set env * readme * add wandb project to setup * update template * update model id * post init to capture the config and weight * extract metadata * update config * update dataset config * move env for wandb project * pre-commit * remove qwen-math from training * more instructions * unused import * remove trl old * warmup ratio * warmup ratio * change model id * change model_id * add info about CUDA_VISIBLE_DEVICES
This commit is contained in:
parent
49f3821098
commit
56ce2e79a7
59 changed files with 382 additions and 155340 deletions
|
|
@ -2,8 +2,6 @@
|
|||
|
||||
Training codebase for training LLMs using Reasoning Gym procedural dataset generators.
|
||||
|
||||
**Note**: `qwen-math/` directory contains the code from the Tina project, used for the Qwen2.5 3B RG-Math training. This is separate from the rest of our training/evaluation codebase.
|
||||
|
||||
This readme documents:
|
||||
|
||||
- Training environment setup and usage example
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue