mirror of
https://github.com/open-thought/reasoning-gym.git
synced 2026-04-24 17:05:03 +00:00
* first trl grpo implementation * added config yaml file * added read me and dependencies * updated reward format func |
||
|---|---|---|
| .. | ||
| grpo.yaml | ||