reasoning-gym/training/configs
Zafir Stojanovski c6663cdb81
fix(training): Prepend <think> token in format reward (#396)
* prepend think token in format reward

* pre commit + fix some default vals

* add checkpoint config
2025-03-28 09:45:17 +01:00
..
llama3.1_1b_grpo.yaml fix(training): Prepend <think> token in format reward (#396) 2025-03-28 09:45:17 +01:00
qwen2.5_1.5b_grpo.yaml fix(training): Prepend <think> token in format reward (#396) 2025-03-28 09:45:17 +01:00