reasoning-gym

mirror of https://github.com/open-thought/reasoning-gym.git synced 2026-04-27 17:23:19 +00:00

Author	SHA1	Message	Date
Oliver	661cbc4c71	rename	2025-03-26 21:06:06 +00:00
Oliver	678eaa5b81	allow longer outputs	2025-03-26 21:05:37 +00:00
Oliver	165d1de86e	add puzzles config for qwen 3b	2025-03-26 21:04:09 +00:00
Oliver Stanley	eb69916c1b	initial verl training codebase (#389 ) * fixes for latest verl * composite dataset training experiment * use stateful dataloaders to match verl changes * training readme * add formatting reward * length reward impl * standalone reasoning_gym config section * curriculum learning, new length reward, more config	2025-03-20 15:04:57 +00:00