| .. | ||
| internbootcamp_lib@7b218f8e38 | ||
| __init__.py | ||
| bootcamp_registry.py | ||
| intern_bootcamp_env.py | ||
| intern_bootcamp_local_test.py | ||
| README.md | ||
| run_intern_bootcamp.py | ||
InternBootcamp RL Training Environment
Overview
The InternBootcamp RL Training Environment is a flexible and extensible framework for training large reasoning models using reinforcement learning on verifiable reasoning tasks. Based on the InternBootcamp library, this environment provides a seamless integration between InternBootcamp's comprehensive collection of reasoning tasks and the Atropos RL training infrastructure.
How InternBootcamp Works
InternBootcamp is a library that provides:
-
Standardized Task Interface: Each task (called a "bootcamp") implements three core methods:
case_generator(): Generates problem instances with controllable difficultyprompt_func(): Converts problem instances into natural language promptsverify_score(): Verifies and scores model responses
-
Diverse Task Coverage: Over 1,000 verifiable reasoning tasks including:
- Logic puzzles (e.g., Game24, Sudoku, N-Queens)
- Mathematical problems (algebra, geometry, calculus)
- Algorithm challenges (sorting, searching, optimization)
- Game-based reasoning (chess, Go, strategic games)
- Pattern recognition and sequence problems
-
Automatic Task Generation: Tasks can generate unlimited problem instances with:
- Controllable difficulty parameters
- Consistent verification methods
- Scalable complexity
Architecture
InternBootcamp RL Environment
├── Task Selection Layer
│ ├── Single Task Mode (train on one specific bootcamp)
│ ├── Multi-Task Mode (train on multiple bootcamps - TBD)
│ └── Curriculum Mode (progressive difficulty - TBD)
│
├── InternBootcamp Integration
│ ├── Bootcamp Registry (dynamic task discovery)
│ ├── Bootcamp Instance Management
│ ├── Problem Generation Pipeline
│ └── Response Verification System
│
├── RL Training Loop
│ ├── Trajectory Collection
│ ├── Reward Calculation
│ └── Policy Updates
│
└── Atropos Base Environment
├── Server Management
├── Batch Processing
└── Wandb Logging
Key Features
1. Dynamic Task Discovery
The environment automatically discovers all available bootcamp tasks (1000+) without manual imports:
from environments.intern_bootcamp.bootcamp_registry import get_available_bootcamps
# List all available tasks
tasks = get_available_bootcamps()
print(f"Found {len(tasks)} bootcamp tasks")
# Output: Found 1069 bootcamp tasks
2. Simple Task Selection
Train on any available bootcamp task by name:
# Train on Game24
env = InternBootcampEnv(task_name="Game24bootcamp", task_params={"num_numbers": 4})
# Train on Sudoku
env = InternBootcampEnv(task_name="Sudokubootcamp")
# Train on Maze solving
env = InternBootcampEnv(task_name="Mazebootcamp")
3. Automatic Problem Generation
Each training step:
- Instantiates the selected bootcamp with specified parameters
- Generates a new problem instance using
case_generator() - Converts it to a natural language prompt via
prompt_func() - Collects model responses
- Verifies correctness using
verify_score()
4. Flexible Reward System
- Base rewards: Correct/incorrect responses (configurable)
- Format bonuses: Proper answer formatting (e.g.,
\boxed{}for math) - Reasoning bonuses: Quality of step-by-step explanations
- Task-specific scoring: Each bootcamp can define its own scoring logic
Installation
- Clone the repository and navigate to the environment:
cd environments/intern_bootcamp
- Install InternBootcamp (already included as a submodule):
cd internbootcamp_lib && uv pip install -e .
Usage Examples
1. Single Task Training
Train on Game24 puzzles with specific difficulty:
python -m environments.intern_bootcamp serve \
--env--task_name "Game24bootcamp" \
--env--task_params '{"num_numbers": 4, "range_max": 100}' \
--env--group_size 8 \
--env--total_steps 10000
2. Exploring Available Tasks
List all available bootcamp tasks:
from environments.intern_bootcamp.bootcamp_registry import get_available_bootcamps
tasks = get_available_bootcamps()
for task in tasks[:20]: # Show first 20
print(task)
3. Custom Configuration File
Use a YAML configuration for training:
# config/intern_bootcamp_game24.yaml
env:
task_name: "Game24bootcamp"
task_params:
num_numbers: 4
range_max: 50
target_max: 50
correct_reward: 1.0
incorrect_reward: -0.5
format_bonus: 0.2
group_size: 8
total_steps: 10000
steps_per_eval: 100
openai:
model_name: "gpt-4"
temperature: 0.7
max_tokens: 2048
Run with config:
python -m environments.intern_bootcamp serve --config config/intern_bootcamp_game24.yaml
Available Bootcamp Tasks
The environment supports over 1000 bootcamp tasks. Some examples include:
- Math & Logic: Game24bootcamp, Sudokubootcamp, Kakurobootcamp
- Algorithms: Mazebootcamp, Slitherlinkbootcamp, Bridgesbootcamp
- Games: InternGObootcamp, Chessbootcamp
- Pattern Recognition: Arcbootcamp, Nonogramsbootcamp
- Code Generation: CodeIObootcamp, BigCodeBenchbootcamp
- Language Tasks: Cipherbootcamp, WordSortingbootcamp
Use get_available_bootcamps() to see the full list.
Implementation Details
Environment Configuration
class InternBootcampEnvConfig(BaseEnvConfig):
# Task selection
task_name: str = "Game24bootcamp" # Bootcamp task name
task_params: Dict[str, Any] = {} # Task-specific parameters
# Reward configuration
correct_reward: float = 1.0
incorrect_reward: float = -0.5
format_bonus: float = 0.2
# Training parameters
require_reasoning: bool = True
min_reasoning_length: int = 50
temperature: float = 0.7
top_p: float = 0.9
Bootcamp Registry
The environment uses a dynamic registry system to discover and manage bootcamp tasks:
from environments.intern_bootcamp.bootcamp_registry import (
create_bootcamp,
get_available_bootcamps,
bootcamp_registry
)
# Create a bootcamp instance
bootcamp = create_bootcamp("Game24bootcamp", num_numbers=4, range_max=50)
# Get information about a bootcamp
info = bootcamp_registry.get_bootcamp_info("Game24bootcamp")
print(info["parameters"]) # Shows accepted parameters
Evaluation and Metrics
The environment tracks comprehensive metrics:
Performance Metrics
- Task accuracy: Success rate on the specific bootcamp task
- Format compliance: Rate of properly formatted responses
- Reasoning quality: Length and coherence of explanations
Training Metrics
- Reward statistics: Mean, std, min, max rewards
- Problem diversity: Variety of generated problems
- Learning progress: Improvement over time
Troubleshooting
Common Issues
-
Task Not Found
ValueError: Unknown bootcamp: XYZBootcampSolution: Check available tasks with
get_available_bootcamps() -
Import Errors
ImportError: No module named 'internbootcamp'Solution: Install InternBootcamp:
cd internbootcamp_lib && pip install -e . -
Parameter Errors
TypeError: __init__() got an unexpected keyword argumentSolution: Check accepted parameters with
bootcamp_registry.get_bootcamp_info(task_name)
Future Enhancements
- Multi-Task Training: Train on multiple bootcamps simultaneously
- Curriculum Learning: Progressive difficulty advancement
- Task Composition: Combine multiple bootcamps into complex reasoning chains
- Custom Bootcamps: Easy integration of new reasoning tasks
Contributing
To add new features or improvements:
- Fork the repository
- Create a feature branch
- Implement your changes following the existing patterns
- Add tests for new functionality
- Submit a pull request with a clear description
License
This environment follows the same license as the Atropos framework and InternBootcamp library.