reasoning-gym/training/qwen-math/recipes/DeepSeek-R1-Distill-Qwen-1.5B/grpo
Zafir Stojanovski 0cda6b1205
qwen math training code (#435)
* qwen math training code

* pre-commit
2025-05-16 13:19:19 +02:00
..
model_curated_deepscaler.yaml qwen math training code (#435) 2025-05-16 13:19:19 +02:00
model_curated_limr.yaml qwen math training code (#435) 2025-05-16 13:19:19 +02:00
model_curated_limr_large_lr_ablation.yaml qwen math training code (#435) 2025-05-16 13:19:19 +02:00
model_curated_limr_large_rank_ablation.yaml qwen math training code (#435) 2025-05-16 13:19:19 +02:00
model_curated_limr_medium_rank_ablation.yaml qwen math training code (#435) 2025-05-16 13:19:19 +02:00
model_curated_limr_small_lr_ablation.yaml qwen math training code (#435) 2025-05-16 13:19:19 +02:00
model_curated_limr_small_rank_ablation.yaml qwen math training code (#435) 2025-05-16 13:19:19 +02:00
model_curated_limr_tiny_rank_ablation.yaml qwen math training code (#435) 2025-05-16 13:19:19 +02:00
model_curated_open_r1.yaml qwen math training code (#435) 2025-05-16 13:19:19 +02:00
model_curated_open_rs1.yaml qwen math training code (#435) 2025-05-16 13:19:19 +02:00
model_curated_open_rs2.yaml qwen math training code (#435) 2025-05-16 13:19:19 +02:00
model_curated_open_rs3.yaml qwen math training code (#435) 2025-05-16 13:19:19 +02:00
model_curated_open_rs3_drgrpo_ablation.yaml qwen math training code (#435) 2025-05-16 13:19:19 +02:00
model_curated_rg_math.yaml qwen math training code (#435) 2025-05-16 13:19:19 +02:00
model_curated_still.yaml qwen math training code (#435) 2025-05-16 13:19:19 +02:00
model_curated_thoughts.yaml qwen math training code (#435) 2025-05-16 13:19:19 +02:00