reasoning-gym/training/evaluations
Oliver Stanley 10863ea12b
inter-domain generalisation evaluation configs (#424)
* add inter-domain generalisation eval config for algebra

* add algorithmic eval cfg

* vllm infer

* add arithmetic eval cfg

* add geometry eval cfg

* add arc cfg

* add games eval cfg

* add cognition eval cfg

* add graphs eval cfg
2025-04-22 17:32:35 +01:00
..
inter_generalisation inter-domain generalisation evaluation configs (#424) 2025-04-22 17:32:35 +01:00
eval_algebraic_composite.yaml Feat/intragen experiments (#414) 2025-04-16 08:04:52 +02:00
eval_algorithmic_composite.yaml Feat/intragen experiments (#414) 2025-04-16 08:04:52 +02:00
eval_arithmetic_composite.yaml Feat/intragen experiments (#414) 2025-04-16 08:04:52 +02:00
eval_cognition_composite.yaml Feat/intragen experiments (#414) 2025-04-16 08:04:52 +02:00
eval_games_composite.yaml Feat/intragen experiments (#414) 2025-04-16 08:04:52 +02:00
eval_qwen_3b.yaml Feat/intragen experiments (#414) 2025-04-16 08:04:52 +02:00
evaluate_model.py inter-domain generalisation evaluation configs (#424) 2025-04-22 17:32:35 +01:00