reasoning-gym/examples
Andreas Köpf 28dc0932c4 Merge pull request #178 from olliestanley/feature/unsloth-train
Add minimal working GRPO training example with Unsloth
2025-02-21 15:37:24 +01:00
..
OpenRLHF use native types List->list, Dict->dict, Set->set, Tuple->tuple 2025-02-21 15:15:38 +01:00
trl docs: Update TRL README with GRPO example details and usage instructions (#76) 2025-02-07 07:56:22 +01:00
unsloth Better progress tracking 2025-02-20 23:32:54 +00:00
veRL use native types List->list, Dict->dict, Set->set, Tuple->tuple 2025-02-21 15:15:38 +01:00
word_ladder use native types List->list, Dict->dict, Set->set, Tuple->tuple 2025-02-21 15:15:38 +01:00