mirror of https://github.com/open-thought/reasoning-gym.git synced 2026-04-19 12:58:07 +00:00

mirror of github.com/open-thought/reasoning-gym

Find a file

Cavit Erginsoy d5065955a8 Refactor word ladder generation with improved validation and graph-based path finding - Enhanced configuration validation with size and length constraints - Implemented graph-based neighbor computation and caching - Simplified path finding algorithm with more robust length checking - Added more flexible word set loading with configurable length ranges - Improved error handling for dataset generation		2025-02-03 07:21:43 +00:00
examples	Completed: full example suite	2025-02-03 07:21:12 +00:00
notebooks	add eval demo for generated script	2025-01-29 18:28:17 +01:00
reasoning_gym	Refactor word ladder generation with improved validation and graph-based path finding	2025-02-03 07:21:43 +00:00
scripts	add simple dataset gallery generation script	2025-01-30 22:30:26 +01:00
tests	update test to match	2025-02-03 03:27:49 +00:00
.gitignore	add .DS_Store	2025-02-03 07:20:17 +00:00
.pre-commit-config.yaml	add first example with OpenRLHF	2025-01-28 14:40:06 +00:00
GALLERY.md	add simple dataset gallery generation script	2025-01-30 22:30:26 +01:00
LICENSE	Initial commit	2025-01-23 09:39:53 +01:00
pyproject.toml	Merge branch 'main' into miserlou/figlet	2025-01-29 23:59:27 +01:00
README.md	Merge branch 'main' of https://github.com/open-thought/reasoning-gym	2025-01-30 21:38:34 +00:00
requirements-dev.txt	update dev dependencies (requirements-dev.txt)	2025-01-23 19:16:40 +01:00

README.md

Reasoning Gym

We are building a python library of procedural dataset generators and algorithmically verifiable reasoning environments for training Reasoning Models with reinforcement learning (RL).

The goal is to generate virtually infinite data with adjustable complexity.

Algorithmic verification allows to train on tasks like Rubik‘s cube or Countdown which have many correct solutions.

Set up for development

Clone the project

git clone https://github.com/open-thought/reasoning-gym.git

Create a virtual environment (here we use conda)

conda create --name reasoning_gym python=3.11 -y
conda activate reasoning_gym

Link project and install dependencies

pip install -e .

Install development dependencies

pip install -r requirements-dev.txt

NOTE: To consume the APIs in reasoning_gym, just install from pip using the following

pip install reasoning-gym

How to instantiate a task dataset?

Example:

import reasoning_gym
data = reasoning_gym.create_dataset('leg_counting', size=10, seed=42)
for i, x in enumerate(data):
    print(f'{i}: q="{x['question']}", a="{x['answer']}"')
    print('metadata:', x['metadata'])
    # use the dataset's `score_answer` method for algorithmic verification
    assert data.score_answer(answer=x['answer'], entry=x) == 1.0

Output:

0: q="How many legs are there in total if you have 1 sea slug, 1 deer?", a="4"
metadata: {'animals': {'sea slug': 1, 'deer': 1}, 'total_legs': 4}
1: q="How many legs are there in total if you have 2 sheeps, 2 dogs?", a="16"
metadata: {'animals': {'sheep': 2, 'dog': 2}, 'total_legs': 16}
2: q="How many legs are there in total if you have 1 crab, 2 lobsters, 1 human, 1 cow, 1 bee?", a="42"
...

See the Dataset Gallery for a complete list of available datasets with examples.

Task Overview

Algebra Tasks

SimpleEquationsDataset: Generate linear equations with one variable to solve (e.g. "3*x + 2 = 14")
PolynomialEquationsDataset: Generate polynomial equations with one variable to solve (e.g. "-6h**4 + 4h*2 - 5h = 0")

Arithmetic Tasks

BasicArithmeticDataset: Generate arithmetic expressions with configurable complexity and operators (+, -, *, /)
ChainSum: Generate addition/subtraction chains with configurable length and digit counts
FractionSimplificationDataset: Generate fraction simplification tasks with configurable complexity
GCDDataset: Generate Greatest Common Divisor problems with configurable number of integers
LCMDataset: Generate Least Common Multiple problems with configurable number of integers
LegCountingDataset: Generate animal leg counting word problems with various animals
PrimeFactorizationDataset: Generate prime factorization tasks with configurable number ranges

Algorithmic Tasks

BaseConversionDataset: Convert numbers between different bases (binary, hex, etc.)
CaesarCipherDataset: Encrypt/decrypt text using Caesar cipher with configurable rotation
LetterCountingDataset: Count letter occurrences in text spans
NumberFilteringDataset: Filter numbers based on comparison with threshold
NumberSortingDataset: Sort lists of numbers in ascending or descending order
WordSortingDataset: Sort words in ascending or descending order using ASCII/Unicode ordering
LetterJumbleDataset: Unscramble words that have had their letters randomly jumbled
SentenceReorderingDataset: Reorder sentence after words in it have been randomly shuffled
SpellBackwardDataset: Spell individual words backward (e.g. "sun" -> "nus")
WordSequenceReversalDataset: Reverse word order in text spans
WordLadderDataset: Generate word ladder puzzles where one word is transformed into another by changing one letter at a time

Cognition Tasks

NumberSequenceDataset: Generate number sequences with discoverable patterns
ColorCubeRotationDataset: Generate 3D spatial reasoning tasks with colored cube rotations and orientation tracking

Logic Tasks

PropositionalLogicDataset: Generate propositional logic reasoning problems

Graph Tasks

FamilyRelationshipsDataset: Generate family relationship reasoning tasks with family trees

Game Tasks

SudokuDataset: Generate 9x9 Sudoku puzzles with configurable number of empty cells
MiniSudokuDataset: Generate 4x4 Mini Sudoku puzzles with configurable difficulty
MazeDataset: Generate a maze with a start and a goal
CountdownDataset: Generate number game tasks where numbers and operators must be combined to reach a target value

Available Generators

PolynomialEquations

Generate polynomial equations with configurable complexity:

from reasoning_gym.algebra import PolynomialEquationsConfig, PolynomialEquationsConfig

config = PolynomialEquationsConfig(
    min_terms=3,
    max_terms=4,
    min_degree=4,
    max_degree=4,
    min_value=1,
    max_value=5,
    size=3,
    seed=123,
)

dataset = PolynomialEquationsDataset(config)
for item in dataset:
    print(item)

Example output:

{'question': 'Find the real value(s) of b in the equation: b**4 - b**3 - 5*b**2 = 0', 'answer': '[-1.79128784747792, 0.0, 2.79128784747792]', 'metadata': {'polynomial_expr': 'b**4 - b**3 - 5*b**2', 'variable': 'b', 'degree': 4, 'real_solutions': [-1.79128784747792, 0.0, 2.79128784747792]}}
{'question': 'Solve the polynomial equation for real i:\n3*i**4 + 4*i**3 - 1 = 0', 'answer': '[]', 'metadata': {'polynomial_expr': '3*i**4 + 4*i**3 - 1', 'variable': 'i', 'degree': 4, 'real_solutions': []}}
{'question': 'Solve the polynomial equation for real h:\n7*h**4 - 2*h**2 + h = 0', 'answer': '[-0.6998793469266564, 0.0]', 'metadata': {'polynomial_expr': '7*h**4 - 2*h**2 + h', 'variable': 'h', 'degree': 4, 'real_solutions': [-0.6998793469266564, 0.0]}}

Basic Arithmetic

Generate arithmetic problems with configurable complexity:

from reasoning_gym.arithmetic import BasicArithmeticDataset, BasicArithmeticDatasetConfig

config = BasicArithmeticDatasetConfig(
    min_terms=2,        # Minimum number of terms in expression
    max_terms=4,        # Maximum number of terms
    min_digits=1,       # Minimum digits per number
    max_digits=2,       # Maximum digits per number
    allow_parentheses=True,  # Include nested expressions
    size=5,            # Number of problems to generate
    seed=42            # For reproducibility
)

dataset = BasicArithmeticDataset(config)
for item in dataset:
    print(item)

Example output:

{'question': '-1 + -5   * 8 + -8 =', 'answer': '-49', 'metadata': {'num_terms': 4, 'num_digits': 1, 'expression': '-1 + -5   * 8 + -8'}}
{'question': '19 - 17 =', 'answer': '2', 'metadata': {'num_terms': 2, 'num_digits': 2, 'expression': '19 - 17'}}
{'question': '3 + -6 * -9 =', 'answer': '57', 'metadata': {'num_terms': 3, 'num_digits': 1, 'expression': '3 + -6 * -9'}}
{'question': '-22 - -94 + -97 =', 'answer': '-25', 'metadata': {'num_terms': 3, 'num_digits': 2, 'expression': '-22 - -94 + -97'}}
{'question': '51 * 63 =', 'answer': '3213', 'metadata': {'num_terms': 2, 'num_digits': 2, 'expression': '51 * 63'}}

Chain Sum

Generate addition/subtraction problems with configurable complexity:

from reasoning_gym.arithmetic import ChainSum, ChainSumConfig

config = ChainSumConfig(
    min_terms=2,        # Minimum numbers to add/subtract
    max_terms=6,        # Maximum numbers
    min_digits=1,       # Minimum digits per number
    max_digits=4,       # Maximum digits per number
    allow_negation=True, # Allow negative numbers
    size=5,             # Number of problems
    seed=42             # For reproducibility
)

dataset = ChainSum(config)
for item in dataset:
    print(item)

Example data:

{
    "question": "426 + 562 =",
    "answer": "988",
    "metadata": { "num_terms": 2, "num_digits": 3, "expression": "426 + 562" },
}
{
    "question": "426 + 562 =",
    "answer": "988",
    "metadata": { "num_terms": 2, "num_digits": 3, "expression": "426 + 562" }
}

Sequence Completion

Generate number sequence completion tasks with dynamic pattern generation:

from reasoning_gym.cognition import NumberSequenceDataset, NumberSequenceConfig

config = NumberSequenceConfig(
    min_terms=4,        # Minimum visible terms
    max_terms=8,        # Maximum visible terms
    min_value=-100,     # Minimum allowed number
    max_value=100,      # Maximum allowed number
    max_complexity=3,   # Maximum operations to combine
    size=5,            # Number of sequences
    seed=42            # For reproducibility
)

dataset = NumberSequenceDataset(config)
for item in dataset:
    print(item)

Example data:

{
    "question": "3, 6, 12, 24, 48, 96, 192, 384, ?",
    "answer": "768",
    "metadata": {"rule": "double", "complexity": 3, "sequence": [3, 6, 12, 24, 48, 96, 192, 384, 768]},
}
{
    "question": "8, 14, 20, 26, 32, 38, 44, ?",
    "answer": "50",
    "metadata": {"rule": "add 6", "complexity": 1, "sequence": [8, 14, 20, 26, 32, 38, 44, 50]},
}

Color Cube Rotation

Generate 3D spatial reasoning tasks with cube rotations and color tracking:

from reasoning_gym.cognition import ColorCubeRotationDataset, ColorCubeRotationConfig

config = ColorCubeRotationConfig(
    min_rotations=1,     # Minimum number of rotations
    max_rotations=3,     # Maximum number of rotations
    size=5,             # Number of problems to generate
    seed=42             # For reproducibility
)

dataset = ColorCubeRotationDataset(config)
for item in dataset:
    print(item)

Example data:

{
    "question": "A cube has:\n- a red top side\n- a blue right side\n- a green front side\n- a yellow left side\n- a white back side\n- an orange bottom side\n\nThe cube is rotated so that the side which was before at the front is now at the top.\nThe cube is rotated so that the side which was before at the right is now at the top.\n\nWhat is now the color of the bottom side of the cube?",
    "answer": "yellow",
    "metadata": {
        "initial_state": {"top": "red", "right": "blue", "front": "green", "left": "yellow", "back": "white", "bottom": "orange"},
        "rotations": ["front", "right"],
        "target_side": "bottom",
        "num_rotations": 2
    }
}

Propositional Logic

Generate logical reasoning tasks with configurable complexity:

from reasoning_gym.logic import PropositionalLogicDataset, PropositionalLogicConfig

config = PropositionalLogicConfig(
    min_vars=2,         # Minimum number of variables
    max_vars=4,         # Maximum number of variables
    min_statements=2,   # Minimum number of given statements
    max_statements=4,   # Maximum number of statements
    max_complexity=3,   # Maximum operator depth
    size=5,            # Number of problems to generate
    seed=42            # For reproducibility
)

dataset = PropositionalLogicDataset(config)
for item in dataset:
    print(item)

Example data:

{
    "question": "Given:\n1. R\n2. Q\nWhat can we conclude?",
    "answer": "(P ∨ Q)",
    "metadata": {"premises": ["R", "Q"], "variables": ["P", "Q", "R", "S"], "complexity": 3},
}
{
    "question": "Given:\n1. ((Q → P) ∨ (Q → P))\n2. ((Q ↔ Q) → (P → P))\n3. P\nWhat can we conclude?",
    "answer": "(P → P)",
    "metadata": {
        "premises": ["((Q → P) ∨ (Q → P))", "((Q ↔ Q) → (P → P))", "P"],
        "variables": ["P", "Q"],
        "complexity": 3,
    },
}

Maze

Generate a maze with configurable difficulty:

from reasoning_gym.games import MazeConfig, MazeDataset

config = MazeConfig(
    min_dist=3,
    max_dist=5,
    min_grid_size=5,
    max_grid_size=5,
    size=2,
    seed=4,
)

dataset = MazeDataset(config)

for item in dataset:
    print()
    print(item["question"])
    print(item)

Example data:

Navigate from 'd' (start) to '}' (goal):

uuuuu
uCCdu
uCCCu
uu}Cu
uuuuu
Legend: 'u' = Wall, 'C' = Path

{'question': "Navigate from 'd' (start) to '}' (goal):\n\nuuuuu\nuCCdu\nuCCCu\nuu}Cu\nuuuuu\nLegend: 'u' = Wall, 'C' = Path\n", 'answer': '3', 'metadata': {'grid_size': 5, 'grid': ['uuuuu', 'uCCdu', 'uCCCu', 'uu}Cu', 'uuuuu'], 'shortest_path_length': 3, 'start': 'd', 'goal': '}', 'wall': 'u', 'path': 'C'}}

Navigate from 'J' (start) to '_' (goal):

<<<<<
<<J<<
<www<
<<w_<
<<<<<
Legend: '<' = Wall, 'w' = Path

{'question': "Navigate from 'J' (start) to '_' (goal):\n\n<<<<<\n<<J<<\n<www<\n<<w_<\n<<<<<\nLegend: '<' = Wall, 'w' = Path\n", 'answer': '3', 'metadata': {'grid_size': 5, 'grid': ['<<<<<', '<<J<<', '<www<', '<<w_<', '<<<<<'], 'shortest_path_length': 3, 'start': 'J', 'goal': '_', 'wall': '<', 'path': 'w'}}

Future Generator Ideas

More complex math tasks (algebra, geometry)
Algorithmic tasks (counting, sorting, re-ordering)
Logic riddles
Logic inductive programming tasks
ARC-AGI synthetic riddles

Call for Contributions

If you have ideas for additional procedural dataset generators please create an issue here or contact us in the #arc-agi-2 channel of the GPU-Mode discord server.

README.md Unescape Escape

Reasoning Gym

Set up for development

How to instantiate a task dataset?

Task Overview

Algebra Tasks

Arithmetic Tasks

Algorithmic Tasks

Cognition Tasks

Logic Tasks

Graph Tasks

Game Tasks

Available Generators

PolynomialEquations

Basic Arithmetic

Chain Sum

Sequence Completion

Color Cube Rotation

Propositional Logic

Maze

Future Generator Ideas

Call for Contributions

README.md