Intern bootcamp env (#146)

* Created registry and started off the env * Local testing works * process working but error in gen * removed old code * adding debug, it's still not progressing to collect trajectories * linting * removed redundant settings
2026-04-19 12:57:58 +00:00 · 2025-05-31 11:22:59 +10:00 · 2025-05-31 11:22:59 +10:00 · 283877dd88
commit 283877dd88
parent ea304892ee
8 changed files with 1218 additions and 0 deletions
--- a/environments/intern_bootcamp/README.md
+++ b/environments/intern_bootcamp/README.md
@ -0,0 +1,272 @@
+# InternBootcamp RL Training Environment
+
+## Overview
+
+The InternBootcamp RL Training Environment is a flexible and extensible framework for training large reasoning models using reinforcement learning on verifiable reasoning tasks. Based on the [InternBootcamp](https://github.com/InternLM/InternBootcamp) library, this environment provides a seamless integration between InternBootcamp's comprehensive collection of reasoning tasks and the Atropos RL training infrastructure.
+
+## How InternBootcamp Works
+
+InternBootcamp is a library that provides:
+
+1. **Standardized Task Interface**: Each task (called a "bootcamp") implements three core methods:
+   - `case_generator()`: Generates problem instances with controllable difficulty
+   - `prompt_func()`: Converts problem instances into natural language prompts
+   - `verify_score()`: Verifies and scores model responses
+
+2. **Diverse Task Coverage**: Over 1,000 verifiable reasoning tasks including:
+   - Logic puzzles (e.g., Game24, Sudoku, N-Queens)
+   - Mathematical problems (algebra, geometry, calculus)
+   - Algorithm challenges (sorting, searching, optimization)
+   - Game-based reasoning (chess, Go, strategic games)
+   - Pattern recognition and sequence problems
+
+3. **Automatic Task Generation**: Tasks can generate unlimited problem instances with:
+   - Controllable difficulty parameters
+   - Consistent verification methods
+   - Scalable complexity
+
+## Architecture
+
+```
+InternBootcamp RL Environment
+├── Task Selection Layer
+│   ├── Single Task Mode (train on one specific bootcamp)
+│   ├── Multi-Task Mode (train on multiple bootcamps - TBD)
+│   └── Curriculum Mode (progressive difficulty - TBD)
+│
+├── InternBootcamp Integration
+│   ├── Bootcamp Registry (dynamic task discovery)
+│   ├── Bootcamp Instance Management
+│   ├── Problem Generation Pipeline
+│   └── Response Verification System
+│
+├── RL Training Loop
+│   ├── Trajectory Collection
+│   ├── Reward Calculation
+│   └── Policy Updates
+│
+└── Atropos Base Environment
+    ├── Server Management
+    ├── Batch Processing
+    └── Wandb Logging
+```
+
+## Key Features
+
+### 1. Dynamic Task Discovery
+The environment automatically discovers all available bootcamp tasks (1000+) without manual imports:
+
+```python
+from environments.intern_bootcamp.bootcamp_registry import get_available_bootcamps
+
+# List all available tasks
+tasks = get_available_bootcamps()
+print(f"Found {len(tasks)} bootcamp tasks")
+# Output: Found 1069 bootcamp tasks
+```
+
+### 2. Simple Task Selection
+Train on any available bootcamp task by name:
+
+```python
+# Train on Game24
+env = InternBootcampEnv(task_name="Game24bootcamp", task_params={"num_numbers": 4})
+
+# Train on Sudoku
+env = InternBootcampEnv(task_name="Sudokubootcamp")
+
+# Train on Maze solving
+env = InternBootcampEnv(task_name="Mazebootcamp")
+```
+
+### 3. Automatic Problem Generation
+Each training step:
+1. Instantiates the selected bootcamp with specified parameters
+2. Generates a new problem instance using `case_generator()`
+3. Converts it to a natural language prompt via `prompt_func()`
+4. Collects model responses
+5. Verifies correctness using `verify_score()`
+
+### 4. Flexible Reward System
+- **Base rewards**: Correct/incorrect responses (configurable)
+- **Format bonuses**: Proper answer formatting (e.g., `\boxed{}` for math)
+- **Reasoning bonuses**: Quality of step-by-step explanations
+- **Task-specific scoring**: Each bootcamp can define its own scoring logic
+
+## Installation
+
+1. Clone the repository and navigate to the environment:
+```bash
+cd environments/intern_bootcamp
+```
+
+2. Install InternBootcamp (already included as a submodule):
+```bash
+cd internbootcamp_lib && uv pip install -e .
+```
+
+## Usage Examples
+
+### 1. Single Task Training
+Train on Game24 puzzles with specific difficulty:
+
+```bash
+python -m environments.intern_bootcamp serve \
+    --env--task_name "Game24bootcamp" \
+    --env--task_params '{"num_numbers": 4, "range_max": 100}' \
+    --env--group_size 8 \
+    --env--total_steps 10000
+```
+
+### 2. Exploring Available Tasks
+List all available bootcamp tasks:
+
+```python
+from environments.intern_bootcamp.bootcamp_registry import get_available_bootcamps
+
+tasks = get_available_bootcamps()
+for task in tasks[:20]:  # Show first 20
+    print(task)
+```
+
+### 3. Custom Configuration File
+Use a YAML configuration for training:
+
+```yaml
+# config/intern_bootcamp_game24.yaml
+env:
+  task_name: "Game24bootcamp"
+  task_params:
+    num_numbers: 4
+    range_max: 50
+    target_max: 50
+
+  correct_reward: 1.0
+  incorrect_reward: -0.5
+  format_bonus: 0.2
+
+  group_size: 8
+  total_steps: 10000
+  steps_per_eval: 100
+
+openai:
+  model_name: "gpt-4"
+  temperature: 0.7
+  max_tokens: 2048
+```
+
+Run with config:
+```bash
+python -m environments.intern_bootcamp serve --config config/intern_bootcamp_game24.yaml
+```
+
+## Available Bootcamp Tasks
+
+The environment supports over 1000 bootcamp tasks. Some examples include:
+
+- **Math & Logic**: Game24bootcamp, Sudokubootcamp, Kakurobootcamp
+- **Algorithms**: Mazebootcamp, Slitherlinkbootcamp, Bridgesbootcamp
+- **Games**: InternGObootcamp, Chessbootcamp
+- **Pattern Recognition**: Arcbootcamp, Nonogramsbootcamp
+- **Code Generation**: CodeIObootcamp, BigCodeBenchbootcamp
+- **Language Tasks**: Cipherbootcamp, WordSortingbootcamp
+
+Use `get_available_bootcamps()` to see the full list.
+
+## Implementation Details
+
+### Environment Configuration
+
+```python
+class InternBootcampEnvConfig(BaseEnvConfig):
+    # Task selection
+    task_name: str = "Game24bootcamp"  # Bootcamp task name
+    task_params: Dict[str, Any] = {}   # Task-specific parameters
+
+    # Reward configuration
+    correct_reward: float = 1.0
+    incorrect_reward: float = -0.5
+    format_bonus: float = 0.2
+
+    # Training parameters
+    require_reasoning: bool = True
+    min_reasoning_length: int = 50
+    temperature: float = 0.7
+    top_p: float = 0.9
+```
+
+### Bootcamp Registry
+
+The environment uses a dynamic registry system to discover and manage bootcamp tasks:
+
+```python
+from environments.intern_bootcamp.bootcamp_registry import (
+    create_bootcamp,
+    get_available_bootcamps,
+    bootcamp_registry
+)
+
+# Create a bootcamp instance
+bootcamp = create_bootcamp("Game24bootcamp", num_numbers=4, range_max=50)
+
+# Get information about a bootcamp
+info = bootcamp_registry.get_bootcamp_info("Game24bootcamp")
+print(info["parameters"])  # Shows accepted parameters
+```
+
+## Evaluation and Metrics
+
+The environment tracks comprehensive metrics:
+
+### Performance Metrics
+- **Task accuracy**: Success rate on the specific bootcamp task
+- **Format compliance**: Rate of properly formatted responses
+- **Reasoning quality**: Length and coherence of explanations
+
+### Training Metrics
+- **Reward statistics**: Mean, std, min, max rewards
+- **Problem diversity**: Variety of generated problems
+- **Learning progress**: Improvement over time
+
+## Troubleshooting
+
+### Common Issues
+
+1. **Task Not Found**
+   ```
+   ValueError: Unknown bootcamp: XYZBootcamp
+   ```
+   Solution: Check available tasks with `get_available_bootcamps()`
+
+2. **Import Errors**
+   ```
+   ImportError: No module named 'internbootcamp'
+   ```
+   Solution: Install InternBootcamp: `cd internbootcamp_lib && pip install -e .`
+
+3. **Parameter Errors**
+   ```
+   TypeError: __init__() got an unexpected keyword argument
+   ```
+   Solution: Check accepted parameters with `bootcamp_registry.get_bootcamp_info(task_name)`
+
+## Future Enhancements
+
+1. **Multi-Task Training**: Train on multiple bootcamps simultaneously
+2. **Curriculum Learning**: Progressive difficulty advancement
+3. **Task Composition**: Combine multiple bootcamps into complex reasoning chains
+4. **Custom Bootcamps**: Easy integration of new reasoning tasks
+
+## Contributing
+
+To add new features or improvements:
+
+1. Fork the repository
+2. Create a feature branch
+3. Implement your changes following the existing patterns
+4. Add tests for new functionality
+5. Submit a pull request with a clear description
+
+## License
+
+This environment follows the same license as the Atropos framework and InternBootcamp library.