diff --git a/training/README.md b/training/README.md index 52eaa123..0caf3a97 100644 --- a/training/README.md +++ b/training/README.md @@ -38,7 +38,7 @@ First, activate the virtual environment you prepared. Example GRPO training usage: ```bash -python3 -u train_grpo.py --config-path configs/external_generalisation --config-name math_curriculum_qwen_7b $@ 2>&1 | tee verl_output.log +python3 -u train_grpo.py --config-path configs/external_generalisation --config-name math_qwen_3b $@ 2>&1 | tee verl_output.log ``` Then, having saved this as a bash script such as `train.sh`, run it: diff --git a/training/configs/external_generalisation/math_qwen_7b.yaml b/training/configs/external_generalisation/math_qwen_3b.yaml similarity index 99% rename from training/configs/external_generalisation/math_qwen_7b.yaml rename to training/configs/external_generalisation/math_qwen_3b.yaml index 1599848f..d8cf78c2 100644 --- a/training/configs/external_generalisation/math_qwen_7b.yaml +++ b/training/configs/external_generalisation/math_qwen_3b.yaml @@ -54,7 +54,7 @@ data: actor_rollout_ref: hybrid_engine: True model: - path: Qwen/Qwen2.5-7B-Instruct + path: Qwen/Qwen2.5-3B-Instruct external_lib: null override_config: { } enable_gradient_checkpointing: True