mirror of
https://github.com/open-thought/reasoning-gym.git
synced 2026-04-19 12:58:07 +00:00
* prepend think token in format reward * pre commit + fix some default vals * add checkpoint config |
||
|---|---|---|
| .. | ||
| llama3.1_1b_grpo.yaml | ||
| qwen2.5_1.5b_grpo.yaml | ||