# Reasoning Gym We are building a python library of procedural dataset generators and algorithmically verifiable reasoning environments for training Reasoning Models with reinforcement learning (RL). The goal is to generate virtually infinite data with adjustable complexity. Algorithmic verification allows to train on tasks like Rubik‘s cube or [Countdown]() which have many correct solutions. ## Set up for development 1. Clone the project ``` git clone https://github.com/open-thought/reasoning-gym.git ``` 2. Create a virtual environment (here we use conda) ``` conda create --name reasoning_gym python=3.11 -y conda activate reasoning_gym ``` 3. Link project and install dependencies ``` pip install -e . ``` 4. Install development dependencies ``` pip install -r requirements-dev.txt ``` > NOTE: To consume the APIs in reasoning_gym, just install from pip using the following ``` pip install reasoning-gym ``` ## How to instantiate a task dataset? Example: ```python import reasoning_gym data = reasoning_gym.create_dataset('leg_counting', size=10, seed=42) for i, x in enumerate(data): print(f'{i}: q="{x['question']}", a="{x['answer']}"') print('metadata:', x['metadata']) # use the dataset's `score_answer` method for algorithmic verification assert data.score_answer(answer=x['answer'], entry=x) == 1.0 ``` Output: ``` 0: q="How many legs are there in total if you have 1 sea slug, 1 deer?", a="4" metadata: {'animals': {'sea slug': 1, 'deer': 1}, 'total_legs': 4} 1: q="How many legs are there in total if you have 2 sheeps, 2 dogs?", a="16" metadata: {'animals': {'sheep': 2, 'dog': 2}, 'total_legs': 16} 2: q="How many legs are there in total if you have 1 crab, 2 lobsters, 1 human, 1 cow, 1 bee?", a="42" ... ``` See the [Dataset Gallery](GALLERY.md) for a complete list of available datasets with examples. ## Task Overview ### Algebra Tasks - `SimpleEquationsDataset`: Generate linear equations with one variable to solve (e.g. "3\*x + 2 = 14") - `PolynomialEquationsDataset`: Generate polynomial equations with one variable to solve (e.g. "-6*h\*\*4 + 4*h\**2 - 5*h = 0") ### Arithmetic Tasks - `BasicArithmeticDataset`: Generate arithmetic expressions with configurable complexity and operators (+, -, \*, /) - `ChainSum`: Generate addition/subtraction chains with configurable length and digit counts - `FractionSimplificationDataset`: Generate fraction simplification tasks with configurable complexity - `GCDDataset`: Generate Greatest Common Divisor problems with configurable number of integers - `LCMDataset`: Generate Least Common Multiple problems with configurable number of integers - `LegCountingDataset`: Generate animal leg counting word problems with various animals - `PrimeFactorizationDataset`: Generate prime factorization tasks with configurable number ranges ### Algorithmic Tasks - `BaseConversionDataset`: Convert numbers between different bases (binary, hex, etc.) - `CaesarCipherDataset`: Encrypt/decrypt text using Caesar cipher with configurable rotation - `LetterCountingDataset`: Count letter occurrences in text spans - `NumberFilteringDataset`: Filter numbers based on comparison with threshold - `NumberSortingDataset`: Sort lists of numbers in ascending or descending order - `WordSortingDataset`: Sort words in ascending or descending order using ASCII/Unicode ordering - `LetterJumbleDataset`: Unscramble words that have had their letters randomly jumbled - `SentenceReorderingDataset`: Reorder sentence after words in it have been randomly shuffled - `SpellBackwardDataset`: Spell individual words backward (e.g. "sun" -> "nus") - `WordSequenceReversalDataset`: Reverse word order in text spans ### Code Tasks - `BFDataset`: Generates BF programs of various difficult, from simple string printing to loops and conditional logic ### Cognition Tasks - `NumberSequenceDataset`: Generate number sequences with discoverable patterns - `ColorCubeRotationDataset`: Generate 3D spatial reasoning tasks with colored cube rotations and orientation tracking - `RubiksCubeDataset`: Generate Rubik's Cube configurations and check correct solutions - `FigletFontDataset`: Generate random words in different "Figlet" fonts for reasoning about the structure of letters ### Logic Tasks - `PropositionalLogicDataset`: Generate propositional logic reasoning problems ### Graph Tasks - `FamilyRelationshipsDataset`: Generate family relationship reasoning tasks with family trees - `QuantumLockDataset`: Generates puzzles which involve stateful arithmetic and a correct sequence of operations ### Game Tasks - `SudokuDataset`: Generate 9x9 Sudoku puzzles with configurable number of empty cells - `MiniSudokuDataset`: Generate 4x4 Mini Sudoku puzzles with configurable difficulty - `MazeDataset`: Generate a maze with a start and a goal - `CountdownDataset`: Generate number game tasks where numbers and operators must be combined to reach a target value ## Available Generators

PolynomialEquations

Generate polynomial equations with configurable complexity:
```python from reasoning_gym.algebra import PolynomialEquationsConfig, PolynomialEquationsConfig config = PolynomialEquationsConfig( min_terms=3, max_terms=4, min_degree=4, max_degree=4, min_value=1, max_value=5, size=3, seed=123, ) dataset = PolynomialEquationsDataset(config) for item in dataset: print(item) ``` Example output: ``` {'question': 'Find the real value(s) of b in the equation: b**4 - b**3 - 5*b**2 = 0', 'answer': '[-1.79128784747792, 0.0, 2.79128784747792]', 'metadata': {'polynomial_expr': 'b**4 - b**3 - 5*b**2', 'variable': 'b', 'degree': 4, 'real_solutions': [-1.79128784747792, 0.0, 2.79128784747792]}} {'question': 'Solve the polynomial equation for real i:\n3*i**4 + 4*i**3 - 1 = 0', 'answer': '[]', 'metadata': {'polynomial_expr': '3*i**4 + 4*i**3 - 1', 'variable': 'i', 'degree': 4, 'real_solutions': []}} {'question': 'Solve the polynomial equation for real h:\n7*h**4 - 2*h**2 + h = 0', 'answer': '[-0.6998793469266564, 0.0]', 'metadata': {'polynomial_expr': '7*h**4 - 2*h**2 + h', 'variable': 'h', 'degree': 4, 'real_solutions': [-0.6998793469266564, 0.0]}} ```

Basic Arithmetic

Generate arithmetic problems with configurable complexity:
```python from reasoning_gym.arithmetic import BasicArithmeticDataset, BasicArithmeticDatasetConfig config = BasicArithmeticDatasetConfig( min_terms=2, # Minimum number of terms in expression max_terms=4, # Maximum number of terms min_digits=1, # Minimum digits per number max_digits=2, # Maximum digits per number allow_parentheses=True, # Include nested expressions size=5, # Number of problems to generate seed=42 # For reproducibility ) dataset = BasicArithmeticDataset(config) for item in dataset: print(item) ``` Example output: ``` {'question': '-1 + -5 * 8 + -8 =', 'answer': '-49', 'metadata': {'num_terms': 4, 'num_digits': 1, 'expression': '-1 + -5 * 8 + -8'}} {'question': '19 - 17 =', 'answer': '2', 'metadata': {'num_terms': 2, 'num_digits': 2, 'expression': '19 - 17'}} {'question': '3 + -6 * -9 =', 'answer': '57', 'metadata': {'num_terms': 3, 'num_digits': 1, 'expression': '3 + -6 * -9'}} {'question': '-22 - -94 + -97 =', 'answer': '-25', 'metadata': {'num_terms': 3, 'num_digits': 2, 'expression': '-22 - -94 + -97'}} {'question': '51 * 63 =', 'answer': '3213', 'metadata': {'num_terms': 2, 'num_digits': 2, 'expression': '51 * 63'}} ```

Chain Sum

Generate addition/subtraction problems with configurable complexity:
```python from reasoning_gym.arithmetic import ChainSum, ChainSumConfig config = ChainSumConfig( min_terms=2, # Minimum numbers to add/subtract max_terms=6, # Maximum numbers min_digits=1, # Minimum digits per number max_digits=4, # Maximum digits per number allow_negation=True, # Allow negative numbers size=5, # Number of problems seed=42 # For reproducibility ) dataset = ChainSum(config) for item in dataset: print(item) ``` Example data: ``` { "question": "426 + 562 =", "answer": "988", "metadata": { "num_terms": 2, "num_digits": 3, "expression": "426 + 562" }, } { "question": "426 + 562 =", "answer": "988", "metadata": { "num_terms": 2, "num_digits": 3, "expression": "426 + 562" } } ```

Sequence Completion

Generate number sequence completion tasks with dynamic pattern generation:
```python from reasoning_gym.cognition import NumberSequenceDataset, NumberSequenceConfig config = NumberSequenceConfig( min_terms=4, # Minimum visible terms max_terms=8, # Maximum visible terms min_value=-100, # Minimum allowed number max_value=100, # Maximum allowed number max_complexity=3, # Maximum operations to combine size=5, # Number of sequences seed=42 # For reproducibility ) dataset = NumberSequenceDataset(config) for item in dataset: print(item) ``` Example data: ``` { "question": "3, 6, 12, 24, 48, 96, 192, 384, ?", "answer": "768", "metadata": {"rule": "double", "complexity": 3, "sequence": [3, 6, 12, 24, 48, 96, 192, 384, 768]}, } { "question": "8, 14, 20, 26, 32, 38, 44, ?", "answer": "50", "metadata": {"rule": "add 6", "complexity": 1, "sequence": [8, 14, 20, 26, 32, 38, 44, 50]}, } ```

Color Cube Rotation

Generate 3D spatial reasoning tasks with cube rotations and color tracking:
```python from reasoning_gym.cognition import ColorCubeRotationDataset, ColorCubeRotationConfig config = ColorCubeRotationConfig( min_rotations=1, # Minimum number of rotations max_rotations=3, # Maximum number of rotations size=5, # Number of problems to generate seed=42 # For reproducibility ) dataset = ColorCubeRotationDataset(config) for item in dataset: print(item) ``` Example data: ``` { "question": "A cube has:\n- a red top side\n- a blue right side\n- a green front side\n- a yellow left side\n- a white back side\n- an orange bottom side\n\nThe cube is rotated so that the side which was before at the front is now at the top.\nThe cube is rotated so that the side which was before at the right is now at the top.\n\nWhat is now the color of the bottom side of the cube?", "answer": "yellow", "metadata": { "initial_state": {"top": "red", "right": "blue", "front": "green", "left": "yellow", "back": "white", "bottom": "orange"}, "rotations": ["front", "right"], "target_side": "bottom", "num_rotations": 2 } } ```

Propositional Logic

Generate logical reasoning tasks with configurable complexity:
```python from reasoning_gym.logic import PropositionalLogicDataset, PropositionalLogicConfig config = PropositionalLogicConfig( min_vars=2, # Minimum number of variables max_vars=4, # Maximum number of variables min_statements=2, # Minimum number of given statements max_statements=4, # Maximum number of statements max_complexity=3, # Maximum operator depth size=5, # Number of problems to generate seed=42 # For reproducibility ) dataset = PropositionalLogicDataset(config) for item in dataset: print(item) ``` Example data: ``` { "question": "Given:\n1. R\n2. Q\nWhat can we conclude?", "answer": "(P ∨ Q)", "metadata": {"premises": ["R", "Q"], "variables": ["P", "Q", "R", "S"], "complexity": 3}, } { "question": "Given:\n1. ((Q → P) ∨ (Q → P))\n2. ((Q ↔ Q) → (P → P))\n3. P\nWhat can we conclude?", "answer": "(P → P)", "metadata": { "premises": ["((Q → P) ∨ (Q → P))", "((Q ↔ Q) → (P → P))", "P"], "variables": ["P", "Q"], "complexity": 3, }, } ```

Maze

Generate a maze with configurable difficulty:
```python from reasoning_gym.games import MazeConfig, MazeDataset config = MazeConfig( min_dist=3, max_dist=5, min_grid_size=5, max_grid_size=5, size=2, seed=4, ) dataset = MazeDataset(config) for item in dataset: print() print(item["question"]) print(item) ``` Example data: ``` Navigate from 'd' (start) to '}' (goal): uuuuu uCCdu uCCCu uu}Cu uuuuu Legend: 'u' = Wall, 'C' = Path {'question': "Navigate from 'd' (start) to '}' (goal):\n\nuuuuu\nuCCdu\nuCCCu\nuu}Cu\nuuuuu\nLegend: 'u' = Wall, 'C' = Path\n", 'answer': '3', 'metadata': {'grid_size': 5, 'grid': ['uuuuu', 'uCCdu', 'uCCCu', 'uu}Cu', 'uuuuu'], 'shortest_path_length': 3, 'start': 'd', 'goal': '}', 'wall': 'u', 'path': 'C'}} Navigate from 'J' (start) to '_' (goal): <<<<< < ## Future Generator Ideas - More complex math tasks (algebra, geometry) - Algorithmic tasks (counting, sorting, re-ordering) - Logic riddles - Logic inductive programming tasks - ARC-AGI synthetic riddles ## Call for Contributions If you have ideas for additional procedural dataset generators please create an issue here or contact us in the `#arc-agi-2` channel of the [GPU-Mode discord server](https://discord.gg/gpumode).