reasoning-gym/examples/OpenRLHF
2025-01-30 00:58:34 +01:00
..
.gitignore extract answer from last answer tag 2025-01-28 16:37:19 +00:00
custom_reward.py lint, seed & size for figlet 2025-01-30 00:58:34 +01:00
custom_reward_ppo.sh use more realistic hparams for OpenRLHF example 2025-01-28 22:20:22 +00:00