From e48c1f82cdea0ebde03e1b509f110c2f4aefd4f5 Mon Sep 17 00:00:00 2001
From: "Andreas Koepf (aider)" <andreas.koepf@provisio.com>
Date: Tue, 25 Feb 2025 15:28:12 +0100
Subject: [PATCH] docs: Update installation instructions in eval README

---
 eval/README.md                                  | 17 ++++++++++++-----
 .../{requirements.txt => requirements-eval.txt} |  0
 2 files changed, 12 insertions(+), 5 deletions(-)
 rename eval/{requirements.txt => requirements-eval.txt} (100%)

diff --git a/eval/README.md b/eval/README.md
index d8bbe3a0..d8c178c0 100644
--- a/eval/README.md
+++ b/eval/README.md
@@ -18,17 +18,22 @@ This framework provides tools to evaluate language models on the reasoning_gym d
 
 ## Setup
 
-1. Install the required dependencies:
+1. Install reasoning-gym in development mode:
 ```bash
-pip install -r requirements.txt
+pip install -e ..
 ```
 
-2. Set your OpenRouter API key as an environment variable:
+2. Install the additional dependencies required for evaluation:
+```bash
+pip install -r requirements-eval.txt
+```
+
+3. Set your OpenRouter API key as an environment variable:
 ```bash
 export OPENROUTER_API_KEY=your-api-key
 ```
 
-3. Prepare your dataset configuration in JSON format (e.g., `eval_basic.json`):
+4. Prepare your dataset configuration in JSON format (e.g., `eval_basic.json`):
 ```json
 [
   {
@@ -47,9 +52,11 @@ You can run evaluations in two ways:
 
 1. Using the provided bash script:
 ```bash
-./run_eval.sh
+./eval.sh
 ```
 
+   Before running, you may want to edit the `eval.sh` script to configure which models to evaluate by modifying the `MODELS` array.
+
 2. Running the Python script directly:
 ```bash
 python eval.py --model "model-name" --config "eval_basic.json" --output-dir "results"
diff --git a/eval/requirements.txt b/eval/requirements-eval.txt
similarity index 100%
rename from eval/requirements.txt
rename to eval/requirements-eval.txt