mirror of
https://github.com/open-thought/reasoning-gym.git
synced 2026-04-24 17:05:03 +00:00
docs: Update installation instructions in eval README
This commit is contained in:
parent
a1b0a0414e
commit
e48c1f82cd
2 changed files with 12 additions and 5 deletions
|
|
@ -18,17 +18,22 @@ This framework provides tools to evaluate language models on the reasoning_gym d
|
||||||
|
|
||||||
## Setup
|
## Setup
|
||||||
|
|
||||||
1. Install the required dependencies:
|
1. Install reasoning-gym in development mode:
|
||||||
```bash
|
```bash
|
||||||
pip install -r requirements.txt
|
pip install -e ..
|
||||||
```
|
```
|
||||||
|
|
||||||
2. Set your OpenRouter API key as an environment variable:
|
2. Install the additional dependencies required for evaluation:
|
||||||
|
```bash
|
||||||
|
pip install -r requirements-eval.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Set your OpenRouter API key as an environment variable:
|
||||||
```bash
|
```bash
|
||||||
export OPENROUTER_API_KEY=your-api-key
|
export OPENROUTER_API_KEY=your-api-key
|
||||||
```
|
```
|
||||||
|
|
||||||
3. Prepare your dataset configuration in JSON format (e.g., `eval_basic.json`):
|
4. Prepare your dataset configuration in JSON format (e.g., `eval_basic.json`):
|
||||||
```json
|
```json
|
||||||
[
|
[
|
||||||
{
|
{
|
||||||
|
|
@ -47,9 +52,11 @@ You can run evaluations in two ways:
|
||||||
|
|
||||||
1. Using the provided bash script:
|
1. Using the provided bash script:
|
||||||
```bash
|
```bash
|
||||||
./run_eval.sh
|
./eval.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Before running, you may want to edit the `eval.sh` script to configure which models to evaluate by modifying the `MODELS` array.
|
||||||
|
|
||||||
2. Running the Python script directly:
|
2. Running the Python script directly:
|
||||||
```bash
|
```bash
|
||||||
python eval.py --model "model-name" --config "eval_basic.json" --output-dir "results"
|
python eval.py --model "model-name" --config "eval_basic.json" --output-dir "results"
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue