mirror of
https://github.com/open-thought/reasoning-gym.git
synced 2026-04-19 12:58:07 +00:00
96 lines
4.2 KiB
Markdown
96 lines
4.2 KiB
Markdown
# Contributing to Reasoning Gym
|
|
|
|
Thank you for your interest in contributing to Reasoning Gym! This document provides guidelines and instructions for contributing to the project.
|
|
|
|
## Development Setup
|
|
|
|
1. Clone the repository:
|
|
```bash
|
|
git clone https://github.com/open-thought/reasoning-gym.git
|
|
```
|
|
|
|
2. Create a virtual environment (using conda):
|
|
```bash
|
|
conda create --name reasoning_gym python=3.11 -y
|
|
conda activate reasoning_gym
|
|
```
|
|
|
|
3. Install the package in editable mode:
|
|
```bash
|
|
pip install -e .
|
|
```
|
|
|
|
4. Install development dependencies:
|
|
```bash
|
|
pip install -r requirements-dev.txt
|
|
```
|
|
|
|
## Creating Procedural Datasets
|
|
|
|
When creating new datasets, please follow these guidelines:
|
|
|
|
1. **Focus on Complex Problems**:
|
|
- Prioritize problems where guessing has a low probability of success (e.g., number multiplication)
|
|
- Avoid tasks with small answer sets (true/false, multiple-choice) as they create noisy rewards for RL
|
|
|
|
2. **Implementation Requirements**:
|
|
- Create a configuration class
|
|
- Derive your dataset class from `ProceduralDataset` (see [dataset.py](https://github.com/open-thought/reasoning-gym/blob/main/reasoning_gym/dataset.py))
|
|
- Include comprehensive unit tests
|
|
- Return dictionary items with keys: `"question"`, `"answer"`, and `"metadata"`
|
|
- For datasets with multiple correct answers, override the `score_answer()` method (return value range: [0, 1])
|
|
|
|
3. **Getting Started**:
|
|
- Review an example implementation:
|
|
- Configuration & dataset class: [chain_sum.py](reasoning_gym/arithmetic/chain_sum.py)
|
|
- Unit tests: [test_chain_sum.py](https://github.com/open-thought/reasoning-gym/blob/main/tests/test_chain_sum.py)
|
|
- Write clear question prompts that an average human can understand and answer correctly
|
|
|
|
## Pull Request Process
|
|
|
|
1. **Fork and Clone**:
|
|
- [Fork the repository](https://docs.github.com/en/get-started/quickstart/fork-a-repo)
|
|
- Clone your fork locally
|
|
- Read more about [forks](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/about-forks)
|
|
|
|
2. **Create a Feature Branch**:
|
|
- Work on a [new branch](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-and-deleting-branches-within-your-repository)
|
|
- Keep changes focused and minimal
|
|
|
|
3. **Code Quality**:
|
|
- Install pre-commit hooks: `pre-commit install`
|
|
- Run `pre-commit run -a` before committing
|
|
- When using AI coding assistants (cursor, aider, etc.), ensure proper formatting
|
|
|
|
4. **Submit Your PR**:
|
|
- [Create a Pull Request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork)
|
|
- [Request review](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/requesting-a-pull-request-review)
|
|
- Do not include changes to `GALLERY.md` (it's updated automatically)
|
|
- (Optional, but desirable) If you have an OpenRouter API key, please try running DeepSeek R1 against 5-10 samples from your dataset to make sure there are no unexpected issues with your dataset.
|
|
1. Update the configuration file `eval/r1/yaml/test.yaml` with your dataset:
|
|
```yaml
|
|
# test.yaml
|
|
model: deepseek/deepseek-r1
|
|
category: test
|
|
datasets:
|
|
- {YOUR_DATASET_NAME}
|
|
eval_dir: eval/r1
|
|
dataset_size: 10
|
|
dataset_seed: 42
|
|
developer_role: system
|
|
```
|
|
2. Run the evaluation script:
|
|
```bash
|
|
python eval/r1/eval.py --yaml "eval/r1/yaml/test.yaml"
|
|
```
|
|
3. Review the results in `eval/r1/test/{YOUR_DATASET_NAME}.json` and make sure there are no unexpected issues with the dataset generation, model's instruction following, or the scoring function.
|
|
4. Include the results in your PR description.
|
|
|
|
5. **Review Process**:
|
|
- Address reviewer feedback promptly
|
|
- Keep discussions constructive
|
|
- Once approved, your changes will be merged into `main`
|
|
|
|
## Need Help?
|
|
|
|
Join our community discussion in the `#reasoning-gym` channel on the [GPU-Mode Discord server](https://discord.gg/gpumode).
|