reasoning-gym/training/qwen-math/recipes/DeepSeek-R1-Distill-Qwen-1.5B
Zafir Stojanovski 0cda6b1205
qwen math training code (#435)
* qwen math training code

* pre-commit
2025-05-16 13:19:19 +02:00
..
grpo qwen math training code (#435) 2025-05-16 13:19:19 +02:00