reasoning-gym

mirror of https://github.com/open-thought/reasoning-gym.git synced 2026-04-19 12:58:07 +00:00

History

Andreas Koepf 3e7ff3b084 use native types List->list, Dict->dict, Set->set, Tuple->tuple		2025-02-21 15:15:38 +01:00
..
.gitignore	extract answer from last answer tag	2025-01-28 16:37:19 +00:00
custom_reward.py	use native types List->list, Dict->dict, Set->set, Tuple->tuple	2025-02-21 15:15:38 +01:00
custom_reward_ppo.sh	use more realistic hparams for OpenRLHF example	2025-01-28 22:20:22 +00:00