reasoning-gym/training/evaluations/curriculum/knights_knaves.yaml
Zafir Stojanovski a969d8ef05
feat(curriculum): Knights and Knaves configs (#488)
* configs

* reduce complexity of curriculum

* update lower bound

* add failure threshold

* update last_k

* update thresholds for success and failure

* update curriculum file as well

* update run name for noncurriculum

* lint

* dtype model eval

* return binary scoring

* set eval repeats to 3

* fix tests
2025-07-31 10:18:05 +02:00

34 lines
1.1 KiB
YAML

# Config used for evaluating curriculum generalisation experiment models on Knights and Knaves
# Models evaluated on this config:
# Qwen/Qwen2.5-3B-Instruct (original model)
# qwen3b_knights-knaves_noncurriculum (original + 300 GRPO steps on non-curriculum Knights and Knaves data)
# qwen3b_knights-knaves_curriculum (original + 300 GRPO steps on curriculum Knights and Knaves data)
model_path: Qwen/Qwen2.5-3B-Instruct # Default model path
# model_path: /workspace/reasoning-gym/training/qwen3b_knights-knaves_noncurriculum
# model_path: /workspace/reasoning-gym/training/qwen3b_knights-knaves_curriculum
max_tokens: 2048 # From max_response_length in training config
top_p: 1.0
temperature: 1.0 # Lower temperature for more focused responses
dtype: bfloat16
developer_prompt: DeepSeekZero
developer_role: system
output_dir: results
save_metadata: true
save_full_results: true
eval_repeats: 3
categories:
- category: logic
datasets:
- dataset: knights_knaves
size: 100
seed: 42
params:
n_people: 4
depth_constraint: 3
width_constraint: 3