mirror of
https://github.com/open-thought/reasoning-gym.git
synced 2026-04-29 17:35:16 +00:00
docs: Add info about reasoning-gym-eval repository for evaluation results
This commit is contained in:
parent
1d25601f15
commit
a073a2792b
1 changed files with 17 additions and 1 deletions
|
|
@ -2,6 +2,12 @@
|
|||
|
||||
A simple asynchronous framework for evaluating language models on reasoning tasks using the OpenRouter API.
|
||||
|
||||
## Evaluation Results Repository
|
||||
|
||||
In order to keep the main repo clean and not clutter it with evaluation traces from different models, we store all evaluation results in a separate repository: [reasoning-gym-eval](https://github.com/open-thought/reasoning-gym-eval).
|
||||
|
||||
If you run evaluations and want to contribute your results, please create a pull request in the [reasoning-gym-eval](https://github.com/open-thought/reasoning-gym-eval) repository, not in the main reasoning-gym repo.
|
||||
|
||||
## Overview
|
||||
|
||||
This framework provides tools to evaluate language models on the reasoning_gym datasets. It supports:
|
||||
|
|
@ -68,5 +74,15 @@ The framework generates two types of output files:
|
|||
├── eval.py # Main evaluation script
|
||||
├── run_eval.sh # Bash script for running evaluations
|
||||
├── eval_basic.json # Dataset configuration file
|
||||
└── results/ # Output directory
|
||||
└── results/ # Output directory (for temporary results)
|
||||
```
|
||||
|
||||
## Contributing Evaluation Results
|
||||
|
||||
After running evaluations:
|
||||
|
||||
1. Fork the [reasoning-gym-eval](https://github.com/open-thought/reasoning-gym-eval) repository
|
||||
2. Add your evaluation results to the appropriate directory
|
||||
3. Create a pull request with your results
|
||||
|
||||
This helps us maintain a clean separation between code and evaluation data while collecting comprehensive benchmarks across different models.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue