reasoning-gym/training/evaluations
Zafir Stojanovski a969d8ef05
feat(curriculum): Knights and Knaves configs (#488)
* configs

* reduce complexity of curriculum

* update lower bound

* add failure threshold

* update last_k

* update thresholds for success and failure

* update curriculum file as well

* update run name for noncurriculum

* lint

* dtype model eval

* return binary scoring

* set eval repeats to 3

* fix tests
2025-07-31 10:18:05 +02:00
..
curriculum feat(curriculum): Knights and Knaves configs (#488) 2025-07-31 10:18:05 +02:00
inter_generalisation updated inter-domain generalisation eval configs (#432) 2025-05-15 09:08:16 +02:00
intra-generalisation Added games training and evaluation configuration (#426) 2025-04-26 19:45:32 +01:00
lmeh update training dir with external eval details (#437) 2025-05-19 00:35:41 +02:00
evaluate_model.py feat(curriculum): Knights and Knaves configs (#488) 2025-07-31 10:18:05 +02:00