reasoning-gym/examples/OpenRLHF
2025-01-28 22:20:22 +00:00
..
.gitignore extract answer from last answer tag 2025-01-28 16:37:19 +00:00
custom_reward.py extract answer from last answer tag 2025-01-28 16:37:19 +00:00
custom_reward_ppo.sh use more realistic hparams for OpenRLHF example 2025-01-28 22:20:22 +00:00