mirror of
https://github.com/open-thought/reasoning-gym.git
synced 2026-04-19 12:58:07 +00:00
docs: Update installation instructions in eval README
This commit is contained in:
parent
a1b0a0414e
commit
e48c1f82cd
2 changed files with 12 additions and 5 deletions
|
|
@ -18,17 +18,22 @@ This framework provides tools to evaluate language models on the reasoning_gym d
|
|||
|
||||
## Setup
|
||||
|
||||
1. Install the required dependencies:
|
||||
1. Install reasoning-gym in development mode:
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
pip install -e ..
|
||||
```
|
||||
|
||||
2. Set your OpenRouter API key as an environment variable:
|
||||
2. Install the additional dependencies required for evaluation:
|
||||
```bash
|
||||
pip install -r requirements-eval.txt
|
||||
```
|
||||
|
||||
3. Set your OpenRouter API key as an environment variable:
|
||||
```bash
|
||||
export OPENROUTER_API_KEY=your-api-key
|
||||
```
|
||||
|
||||
3. Prepare your dataset configuration in JSON format (e.g., `eval_basic.json`):
|
||||
4. Prepare your dataset configuration in JSON format (e.g., `eval_basic.json`):
|
||||
```json
|
||||
[
|
||||
{
|
||||
|
|
@ -47,9 +52,11 @@ You can run evaluations in two ways:
|
|||
|
||||
1. Using the provided bash script:
|
||||
```bash
|
||||
./run_eval.sh
|
||||
./eval.sh
|
||||
```
|
||||
|
||||
Before running, you may want to edit the `eval.sh` script to configure which models to evaluate by modifying the `MODELS` array.
|
||||
|
||||
2. Running the Python script directly:
|
||||
```bash
|
||||
python eval.py --model "model-name" --config "eval_basic.json" --output-dir "results"
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue