reasoning-gym/examples/OpenRLHF
2025-02-21 15:15:38 +01:00
..
.gitignore extract answer from last answer tag 2025-01-28 16:37:19 +00:00
custom_reward.py use native types List->list, Dict->dict, Set->set, Tuple->tuple 2025-02-21 15:15:38 +01:00
custom_reward_ppo.sh use more realistic hparams for OpenRLHF example 2025-01-28 22:20:22 +00:00