diff --git a/eval/README.md b/eval/README.md
index d8bbe3a0..d8c178c0 100644
--- a/eval/README.md
+++ b/eval/README.md
@@ -18,17 +18,22 @@ This framework provides tools to evaluate language models on the reasoning_gym d
 
 ## Setup
 
-1. Install the required dependencies:
+1. Install reasoning-gym in development mode:
 ```bash
-pip install -r requirements.txt
+pip install -e ..
 ```
 
-2. Set your OpenRouter API key as an environment variable:
+2. Install the additional dependencies required for evaluation:
+```bash
+pip install -r requirements-eval.txt
+```
+
+3. Set your OpenRouter API key as an environment variable:
 ```bash
 export OPENROUTER_API_KEY=your-api-key
 ```
 
-3. Prepare your dataset configuration in JSON format (e.g., `eval_basic.json`):
+4. Prepare your dataset configuration in JSON format (e.g., `eval_basic.json`):
 ```json
 [
   {
@@ -47,9 +52,11 @@ You can run evaluations in two ways:
 
 1. Using the provided bash script:
 ```bash
-./run_eval.sh
+./eval.sh
 ```
 
+   Before running, you may want to edit the `eval.sh` script to configure which models to evaluate by modifying the `MODELS` array.
+
 2. Running the Python script directly:
 ```bash
 python eval.py --model "model-name" --config "eval_basic.json" --output-dir "results"
diff --git a/eval/requirements.txt b/eval/requirements-eval.txt
similarity index 100%
rename from eval/requirements.txt
rename to eval/requirements-eval.txt