atropos/environments/intern_bootcamp
2025-11-14 10:17:48 +00:00
..
internbootcamp_lib@7b218f8e38 Intern bootcamp env (#146) 2025-05-31 11:22:59 +10:00
__init__.py Intern bootcamp env (#146) 2025-05-31 11:22:59 +10:00
bootcamp_registry.py Intern bootcamp env (#146) 2025-05-31 11:22:59 +10:00
intern_bootcamp_env.py convert bootcamp to use managedserver 2025-11-14 10:17:48 +00:00
intern_bootcamp_local_test.py Intern bootcamp env (#146) 2025-05-31 11:22:59 +10:00
README.md Intern bootcamp env (#146) 2025-05-31 11:22:59 +10:00
run_intern_bootcamp.py Intern bootcamp env (#146) 2025-05-31 11:22:59 +10:00

InternBootcamp RL Training Environment

Overview

The InternBootcamp RL Training Environment is a flexible and extensible framework for training large reasoning models using reinforcement learning on verifiable reasoning tasks. Based on the InternBootcamp library, this environment provides a seamless integration between InternBootcamp's comprehensive collection of reasoning tasks and the Atropos RL training infrastructure.

How InternBootcamp Works

InternBootcamp is a library that provides:

  1. Standardized Task Interface: Each task (called a "bootcamp") implements three core methods:

    • case_generator(): Generates problem instances with controllable difficulty
    • prompt_func(): Converts problem instances into natural language prompts
    • verify_score(): Verifies and scores model responses
  2. Diverse Task Coverage: Over 1,000 verifiable reasoning tasks including:

    • Logic puzzles (e.g., Game24, Sudoku, N-Queens)
    • Mathematical problems (algebra, geometry, calculus)
    • Algorithm challenges (sorting, searching, optimization)
    • Game-based reasoning (chess, Go, strategic games)
    • Pattern recognition and sequence problems
  3. Automatic Task Generation: Tasks can generate unlimited problem instances with:

    • Controllable difficulty parameters
    • Consistent verification methods
    • Scalable complexity

Architecture

InternBootcamp RL Environment
├── Task Selection Layer
│   ├── Single Task Mode (train on one specific bootcamp)
│   ├── Multi-Task Mode (train on multiple bootcamps - TBD)
│   └── Curriculum Mode (progressive difficulty - TBD)
│
├── InternBootcamp Integration
│   ├── Bootcamp Registry (dynamic task discovery)
│   ├── Bootcamp Instance Management
│   ├── Problem Generation Pipeline
│   └── Response Verification System
│
├── RL Training Loop
│   ├── Trajectory Collection
│   ├── Reward Calculation
│   └── Policy Updates
│
└── Atropos Base Environment
    ├── Server Management
    ├── Batch Processing
    └── Wandb Logging

Key Features

1. Dynamic Task Discovery

The environment automatically discovers all available bootcamp tasks (1000+) without manual imports:

from environments.intern_bootcamp.bootcamp_registry import get_available_bootcamps

# List all available tasks
tasks = get_available_bootcamps()
print(f"Found {len(tasks)} bootcamp tasks")
# Output: Found 1069 bootcamp tasks

2. Simple Task Selection

Train on any available bootcamp task by name:

# Train on Game24
env = InternBootcampEnv(task_name="Game24bootcamp", task_params={"num_numbers": 4})

# Train on Sudoku
env = InternBootcampEnv(task_name="Sudokubootcamp")

# Train on Maze solving
env = InternBootcampEnv(task_name="Mazebootcamp")

3. Automatic Problem Generation

Each training step:

  1. Instantiates the selected bootcamp with specified parameters
  2. Generates a new problem instance using case_generator()
  3. Converts it to a natural language prompt via prompt_func()
  4. Collects model responses
  5. Verifies correctness using verify_score()

4. Flexible Reward System

  • Base rewards: Correct/incorrect responses (configurable)
  • Format bonuses: Proper answer formatting (e.g., \boxed{} for math)
  • Reasoning bonuses: Quality of step-by-step explanations
  • Task-specific scoring: Each bootcamp can define its own scoring logic

Installation

  1. Clone the repository and navigate to the environment:
cd environments/intern_bootcamp
  1. Install InternBootcamp (already included as a submodule):
cd internbootcamp_lib && uv pip install -e .

Usage Examples

1. Single Task Training

Train on Game24 puzzles with specific difficulty:

python -m environments.intern_bootcamp serve \
    --env--task_name "Game24bootcamp" \
    --env--task_params '{"num_numbers": 4, "range_max": 100}' \
    --env--group_size 8 \
    --env--total_steps 10000

2. Exploring Available Tasks

List all available bootcamp tasks:

from environments.intern_bootcamp.bootcamp_registry import get_available_bootcamps

tasks = get_available_bootcamps()
for task in tasks[:20]:  # Show first 20
    print(task)

3. Custom Configuration File

Use a YAML configuration for training:

# config/intern_bootcamp_game24.yaml
env:
  task_name: "Game24bootcamp"
  task_params:
    num_numbers: 4
    range_max: 50
    target_max: 50

  correct_reward: 1.0
  incorrect_reward: -0.5
  format_bonus: 0.2

  group_size: 8
  total_steps: 10000
  steps_per_eval: 100

openai:
  model_name: "gpt-4"
  temperature: 0.7
  max_tokens: 2048

Run with config:

python -m environments.intern_bootcamp serve --config config/intern_bootcamp_game24.yaml

Available Bootcamp Tasks

The environment supports over 1000 bootcamp tasks. Some examples include:

  • Math & Logic: Game24bootcamp, Sudokubootcamp, Kakurobootcamp
  • Algorithms: Mazebootcamp, Slitherlinkbootcamp, Bridgesbootcamp
  • Games: InternGObootcamp, Chessbootcamp
  • Pattern Recognition: Arcbootcamp, Nonogramsbootcamp
  • Code Generation: CodeIObootcamp, BigCodeBenchbootcamp
  • Language Tasks: Cipherbootcamp, WordSortingbootcamp

Use get_available_bootcamps() to see the full list.

Implementation Details

Environment Configuration

class InternBootcampEnvConfig(BaseEnvConfig):
    # Task selection
    task_name: str = "Game24bootcamp"  # Bootcamp task name
    task_params: Dict[str, Any] = {}   # Task-specific parameters

    # Reward configuration
    correct_reward: float = 1.0
    incorrect_reward: float = -0.5
    format_bonus: float = 0.2

    # Training parameters
    require_reasoning: bool = True
    min_reasoning_length: int = 50
    temperature: float = 0.7
    top_p: float = 0.9

Bootcamp Registry

The environment uses a dynamic registry system to discover and manage bootcamp tasks:

from environments.intern_bootcamp.bootcamp_registry import (
    create_bootcamp,
    get_available_bootcamps,
    bootcamp_registry
)

# Create a bootcamp instance
bootcamp = create_bootcamp("Game24bootcamp", num_numbers=4, range_max=50)

# Get information about a bootcamp
info = bootcamp_registry.get_bootcamp_info("Game24bootcamp")
print(info["parameters"])  # Shows accepted parameters

Evaluation and Metrics

The environment tracks comprehensive metrics:

Performance Metrics

  • Task accuracy: Success rate on the specific bootcamp task
  • Format compliance: Rate of properly formatted responses
  • Reasoning quality: Length and coherence of explanations

Training Metrics

  • Reward statistics: Mean, std, min, max rewards
  • Problem diversity: Variety of generated problems
  • Learning progress: Improvement over time

Troubleshooting

Common Issues

  1. Task Not Found

    ValueError: Unknown bootcamp: XYZBootcamp
    

    Solution: Check available tasks with get_available_bootcamps()

  2. Import Errors

    ImportError: No module named 'internbootcamp'
    

    Solution: Install InternBootcamp: cd internbootcamp_lib && pip install -e .

  3. Parameter Errors

    TypeError: __init__() got an unexpected keyword argument
    

    Solution: Check accepted parameters with bootcamp_registry.get_bootcamp_info(task_name)

Future Enhancements

  1. Multi-Task Training: Train on multiple bootcamps simultaneously
  2. Curriculum Learning: Progressive difficulty advancement
  3. Task Composition: Combine multiple bootcamps into complex reasoning chains
  4. Custom Bootcamps: Easy integration of new reasoning tasks

Contributing

To add new features or improvements:

  1. Fork the repository
  2. Create a feature branch
  3. Implement your changes following the existing patterns
  4. Add tests for new functionality
  5. Submit a pull request with a clear description

License

This environment follows the same license as the Atropos framework and InternBootcamp library.