reasoning-gym

mirror of https://github.com/open-thought/reasoning-gym.git synced 2026-04-25 17:10:51 +00:00

History

Zafir Stojanovski 56ce2e79a7 tutorial(training): Add a minimal example with `trl` (#473 ) * v0 * 2 gpu setup * improve parsing from yaml * update yaml dataset example * remove restriction on flash attn * more comments * first version of the readme * pin torch * simplify requirements * just flash attn * use set env instead * simpler set env * readme * add wandb project to setup * update template * update model id * post init to capture the config and weight * extract metadata * update config * update dataset config * move env for wandb project * pre-commit * remove qwen-math from training * more instructions * unused import * remove trl old * warmup ratio * warmup ratio * change model id * change model_id * add info about CUDA_VISIBLE_DEVICES	2025-06-21 00:01:31 +02:00
..
ds_zero2.yaml	tutorial(training): Add a minimal example with `trl` (#473 )	2025-06-21 00:01:31 +02:00
grpo.yaml	tutorial(training): Add a minimal example with `trl` (#473 )	2025-06-21 00:01:31 +02:00

Zafir Stojanovski 56ce2e79a7

tutorial(training): Add a minimal example with trl (#473 )

* v0

* 2 gpu setup

* improve parsing from yaml

* update yaml dataset example

* remove restriction on flash attn

* more comments

* first version of the readme

* pin torch

* simplify requirements

* just flash attn

* use set env instead

* simpler set env

* readme

* add wandb project to setup

* update template

* update model id

* post init to capture the config and weight

* extract metadata

* update config

* update dataset config

* move env for wandb project

* pre-commit

* remove qwen-math from training

* more instructions

* unused import

* remove trl old

* warmup ratio

* warmup ratio

* change model id

* change model_id

* add info about CUDA_VISIBLE_DEVICES

2025-06-21 00:01:31 +02:00

ds_zero2.yaml tutorial(training): Add a minimal example with trl (#473 ) 2025-06-21 00:01:31 +02:00

grpo.yaml tutorial(training): Add a minimal example with trl (#473 ) 2025-06-21 00:01:31 +02:00