reasoning-gym/eval/yaml
Andreas Köpf bfa5f8078b
Eval N completions per prompt (#374)
* feat: Add support for generating multiple completions per prompt
* feat: Track best and mean scores for multiple completions per prompt
* feat: Add checkpoint and resume functionality to evaluation script
2025-03-15 16:39:36 +01:00
..
claude-3.5-sonnet.yaml update eval yaml config files 2025-03-10 00:48:32 +01:00
claude-3.7-sonnet_thinking.yaml Eval N completions per prompt (#374) 2025-03-15 16:39:36 +01:00
deepseek-r1.yaml Eval N completions per prompt (#374) 2025-03-15 16:39:36 +01:00
llama-3.3-70b-instruct.yaml Eval N completions per prompt (#374) 2025-03-15 16:39:36 +01:00
openai-o1.yaml Eval N completions per prompt (#374) 2025-03-15 16:39:36 +01:00
openai-o3-mini.yaml Eval N completions per prompt (#374) 2025-03-15 16:39:36 +01:00