From e48c1f82cdea0ebde03e1b509f110c2f4aefd4f5 Mon Sep 17 00:00:00 2001 From: "Andreas Koepf (aider)" Date: Tue, 25 Feb 2025 15:28:12 +0100 Subject: [PATCH] docs: Update installation instructions in eval README --- eval/README.md | 17 ++++++++++++----- .../{requirements.txt => requirements-eval.txt} | 0 2 files changed, 12 insertions(+), 5 deletions(-) rename eval/{requirements.txt => requirements-eval.txt} (100%) diff --git a/eval/README.md b/eval/README.md index d8bbe3a0..d8c178c0 100644 --- a/eval/README.md +++ b/eval/README.md @@ -18,17 +18,22 @@ This framework provides tools to evaluate language models on the reasoning_gym d ## Setup -1. Install the required dependencies: +1. Install reasoning-gym in development mode: ```bash -pip install -r requirements.txt +pip install -e .. ``` -2. Set your OpenRouter API key as an environment variable: +2. Install the additional dependencies required for evaluation: +```bash +pip install -r requirements-eval.txt +``` + +3. Set your OpenRouter API key as an environment variable: ```bash export OPENROUTER_API_KEY=your-api-key ``` -3. Prepare your dataset configuration in JSON format (e.g., `eval_basic.json`): +4. Prepare your dataset configuration in JSON format (e.g., `eval_basic.json`): ```json [ { @@ -47,9 +52,11 @@ You can run evaluations in two ways: 1. Using the provided bash script: ```bash -./run_eval.sh +./eval.sh ``` + Before running, you may want to edit the `eval.sh` script to configure which models to evaluate by modifying the `MODELS` array. + 2. Running the Python script directly: ```bash python eval.py --model "model-name" --config "eval_basic.json" --output-dir "results" diff --git a/eval/requirements.txt b/eval/requirements-eval.txt similarity index 100% rename from eval/requirements.txt rename to eval/requirements-eval.txt