Intern bootcamp env (#146)

* Created registry and started off the env

* Local testing works

* process working but error in gen

* removed old code

* adding debug, it's still not progressing to collect trajectories

* linting

* removed redundant settings
This commit is contained in:
shannonsands 2025-05-31 11:22:59 +10:00 committed by GitHub
parent ea304892ee
commit 283877dd88
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
8 changed files with 1218 additions and 0 deletions

View file

@ -0,0 +1,272 @@
# InternBootcamp RL Training Environment
## Overview
The InternBootcamp RL Training Environment is a flexible and extensible framework for training large reasoning models using reinforcement learning on verifiable reasoning tasks. Based on the [InternBootcamp](https://github.com/InternLM/InternBootcamp) library, this environment provides a seamless integration between InternBootcamp's comprehensive collection of reasoning tasks and the Atropos RL training infrastructure.
## How InternBootcamp Works
InternBootcamp is a library that provides:
1. **Standardized Task Interface**: Each task (called a "bootcamp") implements three core methods:
- `case_generator()`: Generates problem instances with controllable difficulty
- `prompt_func()`: Converts problem instances into natural language prompts
- `verify_score()`: Verifies and scores model responses
2. **Diverse Task Coverage**: Over 1,000 verifiable reasoning tasks including:
- Logic puzzles (e.g., Game24, Sudoku, N-Queens)
- Mathematical problems (algebra, geometry, calculus)
- Algorithm challenges (sorting, searching, optimization)
- Game-based reasoning (chess, Go, strategic games)
- Pattern recognition and sequence problems
3. **Automatic Task Generation**: Tasks can generate unlimited problem instances with:
- Controllable difficulty parameters
- Consistent verification methods
- Scalable complexity
## Architecture
```
InternBootcamp RL Environment
├── Task Selection Layer
│ ├── Single Task Mode (train on one specific bootcamp)
│ ├── Multi-Task Mode (train on multiple bootcamps - TBD)
│ └── Curriculum Mode (progressive difficulty - TBD)
├── InternBootcamp Integration
│ ├── Bootcamp Registry (dynamic task discovery)
│ ├── Bootcamp Instance Management
│ ├── Problem Generation Pipeline
│ └── Response Verification System
├── RL Training Loop
│ ├── Trajectory Collection
│ ├── Reward Calculation
│ └── Policy Updates
└── Atropos Base Environment
├── Server Management
├── Batch Processing
└── Wandb Logging
```
## Key Features
### 1. Dynamic Task Discovery
The environment automatically discovers all available bootcamp tasks (1000+) without manual imports:
```python
from environments.intern_bootcamp.bootcamp_registry import get_available_bootcamps
# List all available tasks
tasks = get_available_bootcamps()
print(f"Found {len(tasks)} bootcamp tasks")
# Output: Found 1069 bootcamp tasks
```
### 2. Simple Task Selection
Train on any available bootcamp task by name:
```python
# Train on Game24
env = InternBootcampEnv(task_name="Game24bootcamp", task_params={"num_numbers": 4})
# Train on Sudoku
env = InternBootcampEnv(task_name="Sudokubootcamp")
# Train on Maze solving
env = InternBootcampEnv(task_name="Mazebootcamp")
```
### 3. Automatic Problem Generation
Each training step:
1. Instantiates the selected bootcamp with specified parameters
2. Generates a new problem instance using `case_generator()`
3. Converts it to a natural language prompt via `prompt_func()`
4. Collects model responses
5. Verifies correctness using `verify_score()`
### 4. Flexible Reward System
- **Base rewards**: Correct/incorrect responses (configurable)
- **Format bonuses**: Proper answer formatting (e.g., `\boxed{}` for math)
- **Reasoning bonuses**: Quality of step-by-step explanations
- **Task-specific scoring**: Each bootcamp can define its own scoring logic
## Installation
1. Clone the repository and navigate to the environment:
```bash
cd environments/intern_bootcamp
```
2. Install InternBootcamp (already included as a submodule):
```bash
cd internbootcamp_lib && uv pip install -e .
```
## Usage Examples
### 1. Single Task Training
Train on Game24 puzzles with specific difficulty:
```bash
python -m environments.intern_bootcamp serve \
--env--task_name "Game24bootcamp" \
--env--task_params '{"num_numbers": 4, "range_max": 100}' \
--env--group_size 8 \
--env--total_steps 10000
```
### 2. Exploring Available Tasks
List all available bootcamp tasks:
```python
from environments.intern_bootcamp.bootcamp_registry import get_available_bootcamps
tasks = get_available_bootcamps()
for task in tasks[:20]: # Show first 20
print(task)
```
### 3. Custom Configuration File
Use a YAML configuration for training:
```yaml
# config/intern_bootcamp_game24.yaml
env:
task_name: "Game24bootcamp"
task_params:
num_numbers: 4
range_max: 50
target_max: 50
correct_reward: 1.0
incorrect_reward: -0.5
format_bonus: 0.2
group_size: 8
total_steps: 10000
steps_per_eval: 100
openai:
model_name: "gpt-4"
temperature: 0.7
max_tokens: 2048
```
Run with config:
```bash
python -m environments.intern_bootcamp serve --config config/intern_bootcamp_game24.yaml
```
## Available Bootcamp Tasks
The environment supports over 1000 bootcamp tasks. Some examples include:
- **Math & Logic**: Game24bootcamp, Sudokubootcamp, Kakurobootcamp
- **Algorithms**: Mazebootcamp, Slitherlinkbootcamp, Bridgesbootcamp
- **Games**: InternGObootcamp, Chessbootcamp
- **Pattern Recognition**: Arcbootcamp, Nonogramsbootcamp
- **Code Generation**: CodeIObootcamp, BigCodeBenchbootcamp
- **Language Tasks**: Cipherbootcamp, WordSortingbootcamp
Use `get_available_bootcamps()` to see the full list.
## Implementation Details
### Environment Configuration
```python
class InternBootcampEnvConfig(BaseEnvConfig):
# Task selection
task_name: str = "Game24bootcamp" # Bootcamp task name
task_params: Dict[str, Any] = {} # Task-specific parameters
# Reward configuration
correct_reward: float = 1.0
incorrect_reward: float = -0.5
format_bonus: float = 0.2
# Training parameters
require_reasoning: bool = True
min_reasoning_length: int = 50
temperature: float = 0.7
top_p: float = 0.9
```
### Bootcamp Registry
The environment uses a dynamic registry system to discover and manage bootcamp tasks:
```python
from environments.intern_bootcamp.bootcamp_registry import (
create_bootcamp,
get_available_bootcamps,
bootcamp_registry
)
# Create a bootcamp instance
bootcamp = create_bootcamp("Game24bootcamp", num_numbers=4, range_max=50)
# Get information about a bootcamp
info = bootcamp_registry.get_bootcamp_info("Game24bootcamp")
print(info["parameters"]) # Shows accepted parameters
```
## Evaluation and Metrics
The environment tracks comprehensive metrics:
### Performance Metrics
- **Task accuracy**: Success rate on the specific bootcamp task
- **Format compliance**: Rate of properly formatted responses
- **Reasoning quality**: Length and coherence of explanations
### Training Metrics
- **Reward statistics**: Mean, std, min, max rewards
- **Problem diversity**: Variety of generated problems
- **Learning progress**: Improvement over time
## Troubleshooting
### Common Issues
1. **Task Not Found**
```
ValueError: Unknown bootcamp: XYZBootcamp
```
Solution: Check available tasks with `get_available_bootcamps()`
2. **Import Errors**
```
ImportError: No module named 'internbootcamp'
```
Solution: Install InternBootcamp: `cd internbootcamp_lib && pip install -e .`
3. **Parameter Errors**
```
TypeError: __init__() got an unexpected keyword argument
```
Solution: Check accepted parameters with `bootcamp_registry.get_bootcamp_info(task_name)`
## Future Enhancements
1. **Multi-Task Training**: Train on multiple bootcamps simultaneously
2. **Curriculum Learning**: Progressive difficulty advancement
3. **Task Composition**: Combine multiple bootcamps into complex reasoning chains
4. **Custom Bootcamps**: Easy integration of new reasoning tasks
## Contributing
To add new features or improvements:
1. Fork the repository
2. Create a feature branch
3. Implement your changes following the existing patterns
4. Add tests for new functionality
5. Submit a pull request with a clear description
## License
This environment follows the same license as the Atropos framework and InternBootcamp library.

View file

@ -0,0 +1,7 @@
"""
InternBootcamp RL Environment for Atropos
"""
from .intern_bootcamp_env import InternBootcampEnv, InternBootcampEnvConfig
__all__ = ["InternBootcampEnv", "InternBootcampEnvConfig"]

View file

@ -0,0 +1,266 @@
"""
Bootcamp Registry for InternBootcamp Environment
This module provides a registry system for dynamically discovering and managing
InternBootcamp tasks without having to manually import each one.
"""
import importlib
import inspect
import logging
import random
from typing import Any, Dict, List, Type
logger = logging.getLogger(__name__)
class BootcampRegistry:
"""Registry for InternBootcamp tasks with dynamic discovery."""
def __init__(self):
self._registry: Dict[str, Type] = {}
self._discovered = False
def discover_bootcamps(self) -> None:
"""Dynamically discover all available bootcamp classes from InternBootcamp."""
if self._discovered:
return
try:
# Import the internbootcamp.bootcamp module
bootcamp_module = importlib.import_module("internbootcamp.bootcamp")
# Get all attributes from the module
for name in dir(bootcamp_module):
if name.endswith("bootcamp") and not name.startswith("_"):
try:
obj = getattr(bootcamp_module, name)
# Check if it's a class and has the required methods
if (
inspect.isclass(obj)
and hasattr(obj, "case_generator")
and hasattr(obj, "prompt_func")
and hasattr(obj, "verify_score")
):
self._registry[name] = obj
logger.debug(f"Registered bootcamp: {name}")
except Exception as e:
logger.warning(f"Failed to register {name}: {e}")
self._discovered = True
logger.info(f"Discovered {len(self._registry)} bootcamp tasks")
except ImportError as e:
logger.error(f"Failed to import internbootcamp.bootcamp: {e}")
raise
def get_bootcamp_class(self, name: str) -> Type:
"""Get a bootcamp class by name."""
if not self._discovered:
self.discover_bootcamps()
if name not in self._registry:
available = self.list_available_bootcamps()
raise ValueError(
f"Unknown bootcamp: {name}. "
f"Available bootcamps: {', '.join(available[:10])}..."
f" ({len(available)} total)"
)
return self._registry[name]
def create_bootcamp_instance(self, name: str, **params) -> Any:
"""Create an instance of a bootcamp with given parameters."""
bootcamp_class = self.get_bootcamp_class(name)
# Get the __init__ signature to see what parameters are accepted
try:
sig = inspect.signature(bootcamp_class.__init__)
valid_params = {}
# Filter out parameters that the bootcamp doesn't accept
for param_name, param_value in params.items():
if param_name in sig.parameters:
valid_params[param_name] = param_value
else:
logger.warning(
f"Parameter '{param_name}' not accepted by {name}, ignoring"
)
return bootcamp_class(**valid_params)
except Exception as e:
logger.error(f"Failed to create instance of {name}: {e}")
# Try with no parameters as fallback
try:
return bootcamp_class()
except Exception as e:
raise e
def list_available_bootcamps(self) -> List[str]:
"""List all available bootcamp names."""
if not self._discovered:
self.discover_bootcamps()
return sorted(list(self._registry.keys()))
def get_bootcamp_info(self, name: str) -> Dict[str, Any]:
"""Get information about a specific bootcamp."""
bootcamp_class = self.get_bootcamp_class(name)
info = {
"name": name,
"class": bootcamp_class,
"docstring": inspect.getdoc(bootcamp_class) or "No documentation available",
"parameters": {},
}
# Get __init__ parameters
try:
sig = inspect.signature(bootcamp_class.__init__)
for param_name, param in sig.parameters.items():
if param_name not in ["self"]:
param_info = {
"default": (
param.default
if param.default != inspect.Parameter.empty
else None
),
"annotation": (
str(param.annotation)
if param.annotation != inspect.Parameter.empty
else None
),
}
info["parameters"][param_name] = param_info
except Exception as e:
logger.warning(f"Could not inspect parameters for {name}: {e}")
return info
class RandomTask:
"""Special bootcamp that randomly selects from available bootcamps on each call."""
def __init__(self, **params):
self.registry = BootcampRegistry()
self.registry.discover_bootcamps()
self.available_bootcamps = self.registry.list_available_bootcamps()
# Remove base classes and template classes from the list
self.available_bootcamps = [
name
for name in self.available_bootcamps
if not any(x in name.lower() for x in ["base", "template", "{puzzlename}"])
]
self.params = params
self.current_bootcamp = None
self.current_bootcamp_name = None
logger.info(
f"RandomTask initialized with {len(self.available_bootcamps)} available bootcamps"
)
def case_generator(self) -> object:
"""Generate a case by randomly selecting a bootcamp."""
# Select a random bootcamp
self.current_bootcamp_name = random.choice(self.available_bootcamps)
self.current_bootcamp = self.registry.create_bootcamp_instance(
self.current_bootcamp_name, **self.params
)
# Generate case from the selected bootcamp
case = self.current_bootcamp.case_generator()
# Add bootcamp name to the case for tracking
if isinstance(case, dict):
case["_bootcamp_name"] = self.current_bootcamp_name
else:
# If case is not a dict, wrap it
case = {"data": case, "_bootcamp_name": self.current_bootcamp_name}
return case
def prompt_func(self, identity) -> str:
"""Generate prompt using the current bootcamp."""
# Extract the bootcamp name if stored
bootcamp_name = identity.get("_bootcamp_name", self.current_bootcamp_name)
# If we need to recreate the bootcamp (e.g., during scoring)
if not self.current_bootcamp or self.current_bootcamp_name != bootcamp_name:
self.current_bootcamp_name = bootcamp_name
self.current_bootcamp = self.registry.create_bootcamp_instance(
bootcamp_name, **self.params
)
# Remove the bootcamp name before passing to prompt_func
identity_copy = dict(identity)
identity_copy.pop("_bootcamp_name", None)
if "data" in identity_copy and len(identity_copy) == 1:
identity_copy = identity_copy["data"]
return self.current_bootcamp.prompt_func(identity_copy)
@classmethod
def extract_output(cls, output):
"""This should not be called directly for RandomTask."""
raise NotImplementedError(
"RandomTask does not implement extract_output directly"
)
@classmethod
def _verify_correction(cls, solution, identity):
"""This should not be called directly for RandomTask."""
raise NotImplementedError(
"RandomTask does not implement _verify_correction directly"
)
def verify_score(
self,
model_output,
identity,
format_score=0,
short_penalty=True,
short_threshold=100,
format_penalty=True,
) -> float:
"""Verify score using the appropriate bootcamp."""
# Extract the bootcamp name
bootcamp_name = identity.get("_bootcamp_name", self.current_bootcamp_name)
# If we need to recreate the bootcamp
if not self.current_bootcamp or self.current_bootcamp_name != bootcamp_name:
self.current_bootcamp_name = bootcamp_name
self.current_bootcamp = self.registry.create_bootcamp_instance(
bootcamp_name, **self.params
)
# Remove the bootcamp name before passing to verify_score
identity_copy = dict(identity)
identity_copy.pop("_bootcamp_name", None)
if "data" in identity_copy and len(identity_copy) == 1:
identity_copy = identity_copy["data"]
# Call the bootcamp's verify_score method
return self.current_bootcamp.verify_score(
model_output,
identity_copy,
format_score,
short_penalty,
short_threshold,
format_penalty,
)
# Global registry instance
bootcamp_registry = BootcampRegistry()
def get_available_bootcamps() -> List[str]:
"""Get a list of all available bootcamp names."""
return bootcamp_registry.list_available_bootcamps()
def create_bootcamp(name: str, **params) -> Any:
"""Create a bootcamp instance by name with parameters."""
# Special handling for RandomTask
if name == "RandomTask":
return RandomTask(**params)
return bootcamp_registry.create_bootcamp_instance(name, **params)

View file

@ -0,0 +1,406 @@
#!/usr/bin/env python3
"""
InternBootcamp RL Environment for Atropos
This environment integrates InternBootcamp's verifiable reasoning tasks with the Atropos
RL training framework. It supports training on single tasks, with plans for multi-task
and curriculum learning modes.
"""
import asyncio
import logging
from typing import Any, Dict, List, Optional, Tuple, Union
from atroposlib.envs.base import (
APIServerConfig,
BaseEnv,
BaseEnvConfig,
ScoredDataGroup,
)
from atroposlib.utils.tokenize_for_trainer import tokenize_for_trainer
from .bootcamp_registry import create_bootcamp, get_available_bootcamps
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
# System prompt for reasoning tasks
SYSTEM_PROMPT = (
"You are a deep thinking AI with strong reasoning abilities. You may use "
"extremely long chains of thought to deeply consider the problem and "
"deliberate with yourself via systematic reasoning processes to help come "
"to a correct solution.\n\n"
"You should enclose your thoughts and internal monologue inside <think> "
"</think> tags, and then provide your solution or response to the problem. "
"Please think in English, even if the problem is presented in another "
"language.\n\n"
"When solving problems:\n"
"1. Think step by step through the problem inside <think> tags\n"
"2. Show your work clearly in your thinking\n"
"3. Verify your answer before finalizing\n"
"4. Follow the specific answer format requested in the problem\n\n"
"Pay close attention to how the problem asks you to format your answer - "
"some may require specific tags, notations, or formats."
)
class InternBootcampEnvConfig(BaseEnvConfig):
"""Configuration for the InternBootcamp environment."""
# Task selection
task_name: str = "RandomTask" # Random task selection mode
# Task-specific parameters
task_params: Dict[str, Any] = {}
# Reward configuration
correct_reward: float = 1.0
incorrect_reward: float = -0.5
format_bonus: float = 0.2
# Training parameters
require_reasoning: bool = True
min_reasoning_length: int = 50
temperature: float = 0.7
top_p: float = 0.9
class InternBootcampEnv(BaseEnv):
"""Environment for training on InternBootcamp reasoning tasks."""
name = "intern_bootcamp"
def __init__(
self,
config: InternBootcampEnvConfig,
server_configs: Union[List[APIServerConfig], APIServerConfig],
slurm=True,
testing=False,
):
super().__init__(config, server_configs, slurm, testing)
self.config = config
# Task tracking
self.bootcamp_instance = None
self.current_task_name = config.task_name
# Performance tracking
self.task_correct_buffer = []
self.format_correct_buffer = []
self.eval_metrics = []
self.system_prompt = SYSTEM_PROMPT
async def setup(self):
"""Initialize the environment and bootcamp task."""
logger.info(f"Setting up InternBootcampEnv with task: {self.config.task_name}")
# Log available bootcamps
available = get_available_bootcamps()
logger.info(f"Found {len(available)} available bootcamp tasks")
logger.debug(f"Available tasks (first 20): {available[:20]}")
# Initialize the bootcamp task
self._initialize_bootcamp()
# Generate some test problems to verify setup
try:
for i in range(3):
identity = self.bootcamp_instance.case_generator()
prompt = self.bootcamp_instance.prompt_func(identity)
logger.info(f"Test problem {i+1}: {prompt[:100]}...")
except Exception as e:
logger.error(f"Failed to generate test problems: {e}")
raise
def _initialize_bootcamp(self):
"""Initialize the bootcamp instance based on task name."""
try:
# Create bootcamp instance using the registry
self.bootcamp_instance = create_bootcamp(
self.config.task_name, **self.config.task_params
)
logger.info(
f"Initialized {self.config.task_name} with params: {self.config.task_params}"
)
except ValueError as e:
# If task not found, list available tasks
available = get_available_bootcamps()
logger.error(f"Task '{self.config.task_name}' not found!")
logger.error(f"Available tasks (showing first 20): {available[:20]}")
raise e
except Exception as e:
logger.error(f"Failed to initialize bootcamp: {e}")
raise
async def get_next_item(self) -> Tuple[Any, Dict]:
"""Get the next problem from the bootcamp."""
# Generate a new problem
identity = self.bootcamp_instance.case_generator()
prompt = self.bootcamp_instance.prompt_func(identity)
# Log which bootcamp is being used if RandomTask
if (
self.config.task_name == "RandomTask"
and isinstance(identity, dict)
and "_bootcamp_name" in identity
):
logger.info(f"RandomTask selected: {identity['_bootcamp_name']}")
# Create the message format expected by Atropos
messages = [
{"role": "system", "content": self.system_prompt},
{"role": "user", "content": prompt},
]
# Return item with metadata
return (
messages,
{
"identity": identity,
"task_name": self.current_task_name,
"raw_prompt": prompt,
},
)
async def collect_trajectories(self, item) -> Tuple[List, List]:
"""Collect trajectories for the current item."""
messages, metadata = item
logger.info(f"Collecting trajectories for item: {messages}")
# Get completions from the model using chat_completion
completions = await self.server.chat_completion(
messages=messages,
n=self.config.group_size,
max_tokens=self.config.max_token_length,
temperature=self.config.temperature,
top_p=self.config.top_p,
)
to_score = []
for i, completion in enumerate(completions.choices):
model_response = completion.message.content
# Create full conversation for scoring
full_messages = messages + [
{"role": "assistant", "content": model_response}
]
to_score.append((full_messages, metadata, model_response))
# Score the trajectories immediately and return a ScoredDataGroup
scored_data = await self.score(to_score)
backlog = [] # No backlog items for now
return scored_data, backlog
async def score(self, rollout_group_data) -> ScoredDataGroup:
"""Score the collected trajectories using bootcamp verification."""
scored_data = ScoredDataGroup()
scored_data["tokens"] = []
scored_data["masks"] = []
scored_data["scores"] = []
scored_data["messages"] = []
for messages, metadata, model_response in rollout_group_data:
# Verify the response using the bootcamp
identity = metadata["identity"]
# Calculate base score from bootcamp verification
base_score = self.bootcamp_instance.verify_score(
model_response,
identity,
format_score=self.config.format_bonus,
short_penalty=self.config.require_reasoning,
short_threshold=self.config.min_reasoning_length,
)
# Apply reward scaling
if base_score >= 1.0:
# Correct answer with format
final_score = self.config.correct_reward
self.task_correct_buffer.append(1)
self.format_correct_buffer.append(1)
elif base_score > 0:
# Correct format but wrong answer
final_score = self.config.incorrect_reward + base_score
self.task_correct_buffer.append(0)
self.format_correct_buffer.append(1)
else:
# Wrong answer and/or format
final_score = self.config.incorrect_reward
self.task_correct_buffer.append(0)
self.format_correct_buffer.append(0)
# Log the scoring details
logger.debug(
f"Scored response: base_score={base_score}, "
f"final_score={final_score}, "
f"identity={identity}"
)
# Tokenize for trainer
tokens_dict = tokenize_for_trainer(
self.tokenizer,
messages,
None,
)
scored_data["tokens"].append(tokens_dict["tokens"])
scored_data["masks"].append(tokens_dict["masks"])
scored_data["scores"].append(final_score)
scored_data["messages"].append(messages)
return scored_data
async def evaluate(self, *args, **kwargs):
"""Evaluate the model on test problems."""
logger.info(f"Starting evaluation for {self.current_task_name}")
eval_tasks = []
num_eval_problems = 20 # Number of problems to evaluate on
# Generate evaluation problems
for i in range(num_eval_problems):
eval_tasks.append(self.evaluate_single_problem())
# Run evaluations in parallel
results = await asyncio.gather(*eval_tasks)
# Calculate metrics
correct_count = sum(1 for is_correct, _ in results if is_correct)
format_count = sum(1 for _, has_format in results if has_format)
total_count = len(results)
accuracy = correct_count / total_count if total_count > 0 else 0
format_rate = format_count / total_count if total_count > 0 else 0
logger.info(
f"Evaluation complete: accuracy={accuracy:.2%}, "
f"format_rate={format_rate:.2%} "
f"({correct_count}/{total_count} correct)"
)
# Store metrics for wandb logging
self.eval_metrics.append((f"eval/{self.current_task_name}_accuracy", accuracy))
self.eval_metrics.append(
(f"eval/{self.current_task_name}_format_rate", format_rate)
)
self.eval_metrics.append(("eval/overall_accuracy", accuracy))
return self.eval_metrics
async def evaluate_single_problem(self) -> Tuple[bool, bool]:
"""Evaluate a single problem."""
try:
# Generate a problem
identity = self.bootcamp_instance.case_generator()
prompt = self.bootcamp_instance.prompt_func(identity)
# Create messages
messages = [
{"role": "system", "content": self.system_prompt},
{"role": "user", "content": prompt},
]
# Get model response using chat_completion
completion = await self.server.chat_completion(
messages=messages,
n=1,
max_tokens=self.config.max_token_length,
temperature=0.0, # Deterministic for evaluation
top_p=1.0,
split="eval",
)
model_response = completion.choices[0].message.content
# Score the response
score = self.bootcamp_instance.verify_score(
model_response,
identity,
format_score=self.config.format_bonus,
short_penalty=False, # Don't penalize short responses in eval
)
is_correct = score >= 1.0
has_format = score > 0
return is_correct, has_format
except Exception as e:
logger.error(f"Error evaluating problem: {e}")
return False, False
async def wandb_log(self, wandb_metrics: Optional[Dict] = None):
"""Log metrics to wandb."""
if wandb_metrics is None:
wandb_metrics = {}
# Add training metrics
if self.task_correct_buffer:
wandb_metrics[f"train/{self.current_task_name}_accuracy"] = sum(
self.task_correct_buffer
) / len(self.task_correct_buffer)
if self.format_correct_buffer:
wandb_metrics[f"train/{self.current_task_name}_format_rate"] = sum(
self.format_correct_buffer
) / len(self.format_correct_buffer)
# Add evaluation metrics
for metric_name, value in self.eval_metrics:
wandb_metrics[metric_name] = value
# Clear buffers
self.task_correct_buffer = []
self.format_correct_buffer = []
self.eval_metrics = []
await super().wandb_log(wandb_metrics)
@classmethod
def config_init(cls) -> Tuple[InternBootcampEnvConfig, List[APIServerConfig]]:
"""Initialize environment and server configurations."""
env_config = InternBootcampEnvConfig(
tokenizer_name="NousResearch/DeepHermes-3-Llama-3-8B-Preview",
group_size=8,
use_wandb=True,
max_num_workers=64,
rollout_server_url="http://localhost:8000",
total_steps=10000,
batch_size=1024,
steps_per_eval=100,
max_token_length=16384,
inference_weight=1.0,
wandb_name="intern_bootcamp_random_tasks",
data_path_to_save_groups="data/intern_bootcamp_random_tasks.jsonl",
# Task configuration
task_name="RandomTask",
task_params={},
# Reward configuration
correct_reward=1.0,
incorrect_reward=-0.5,
format_bonus=0.2,
# Training parameters
require_reasoning=True,
min_reasoning_length=50,
temperature=0.7,
top_p=0.9,
)
server_configs = [
APIServerConfig(
model_name="NousResearch/DeepHermes-3-Llama-3-8B-Preview",
base_url="http://localhost:9004/v1",
api_key="x",
num_requests_for_eval=64,
)
]
return env_config, server_configs
if __name__ == "__main__":
InternBootcampEnv.cli()

View file

@ -0,0 +1,241 @@
#!/usr/bin/env python3
"""
Local testing script for InternBootcamp environment with RandomTask
"""
import asyncio
import logging
import os
from dotenv import load_dotenv
from atroposlib.envs.base import APIServerConfig, EvalHandlingEnum
from environments.intern_bootcamp.intern_bootcamp_env import (
InternBootcampEnv,
InternBootcampEnvConfig,
)
load_dotenv()
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
async def main():
logger.info("Starting InternBootcamp environment local test runner with RandomTask")
# Test configuration - using RandomTask for multitask curriculum
env_config = InternBootcampEnvConfig(
tokenizer_name="NousResearch/DeepHermes-3-Llama-3-8B-Preview",
group_size=2, # Small group for testing
use_wandb=False,
wandb_name="intern_bootcamp_random_test",
max_num_workers=1,
rollout_server_url="http://localhost:8000",
total_steps=1,
batch_size=2,
steps_per_eval=0,
max_token_length=2048, # Increased for diverse tasks
inference_weight=1.0,
data_path_to_save_groups=None,
eval_handling=EvalHandlingEnum.NONE,
eval_limit_ratio=0.0,
# InternBootcamp specific settings - using RandomTask
task_name="RandomTask",
task_params={}, # RandomTask doesn't need specific params
correct_reward=1.0,
incorrect_reward=-0.5,
format_bonus=0.2,
require_reasoning=True,
min_reasoning_length=20,
temperature=0.7,
top_p=0.9,
)
server_configs = [
APIServerConfig(
model_name="gpt-4o-mini",
base_url="https://api.openai.com/v1",
api_key=os.getenv("OPENAI_API_KEY"),
num_requests_for_eval=0,
)
]
logger.info("Using RandomTask configuration for multitask curriculum")
logger.debug(f"Env Config: {env_config}")
logger.debug(f"Server Configs: {server_configs}")
try:
env = InternBootcampEnv(
config=env_config, server_configs=server_configs, slurm=False
)
except Exception as e:
logger.exception(f"Failed to initialize InternBootcampEnv: {e}")
return
logger.info("Running RandomTask tests")
try:
await env.setup()
# Test 1: Generate multiple random problems to show variety
logger.info("\n========== Test 1: Multiple Random Problems ==========")
for i in range(5):
logger.info(f"\n--- Random Problem {i+1} ---")
item = await env.get_next_item()
prompt_tuple, metadata = item
# Extract bootcamp name from identity if available
bootcamp_name = "Unknown"
if (
isinstance(metadata["identity"], dict)
and "_bootcamp_name" in metadata["identity"]
):
bootcamp_name = metadata["identity"]["_bootcamp_name"]
logger.info(f" Selected Bootcamp: {bootcamp_name}")
logger.info(f" Task: {metadata['task_name']}")
logger.info(f" Prompt preview: {metadata['raw_prompt'][:150]}...")
# Test 2: Collect and score trajectories from a random problem
logger.info("\n========== Test 2: Trajectory Collection & Scoring ==========")
item = await env.get_next_item()
prompt_tuple, metadata = item
# Extract bootcamp name
bootcamp_name = "Unknown"
if (
isinstance(metadata["identity"], dict)
and "_bootcamp_name" in metadata["identity"]
):
bootcamp_name = metadata["identity"]["_bootcamp_name"]
logger.info(f"Testing with bootcamp: {bootcamp_name}")
logger.info(f"Problem: {metadata['raw_prompt'][:200]}...")
# Collect trajectories
scored_data, backlog = await env.collect_trajectories(item)
logger.info(f"Collected and scored {len(scored_data['scores'])} responses")
for i, score in enumerate(scored_data["scores"]):
response_preview = (
scored_data["messages"][i][-1]["content"][:100]
if scored_data["messages"][i]
else "No response"
)
logger.info(
f" Response {i+1}: Score={score:.2f}, Preview: {response_preview}..."
)
# Test 3: Quick evaluation with random tasks
logger.info("\n========== Test 3: Random Task Evaluation ==========")
async def quick_evaluate(*args, **kwargs):
logger.info("Starting evaluation with random tasks")
eval_tasks = []
bootcamp_names = []
for i in range(3): # Only 3 problems for testing
logger.info(f"Starting evaluation problem {i+1}/3")
# Generate a problem to see which bootcamp is selected
test_item = await env.get_next_item()
_, test_metadata = test_item
if (
isinstance(test_metadata["identity"], dict)
and "_bootcamp_name" in test_metadata["identity"]
):
bootcamp_name = test_metadata["identity"]["_bootcamp_name"]
bootcamp_names.append(bootcamp_name)
logger.info(f" Evaluation problem {i+1} using: {bootcamp_name}")
eval_tasks.append(env.evaluate_single_problem())
results = await asyncio.gather(*eval_tasks)
# Calculate metrics
correct_count = sum(1 for is_correct, _ in results if is_correct)
format_count = sum(1 for _, has_format in results if has_format)
total_count = len(results)
accuracy = correct_count / total_count if total_count > 0 else 0
format_rate = format_count / total_count if total_count > 0 else 0
logger.info("Evaluation complete:")
logger.info(f" Bootcamps used: {bootcamp_names}")
logger.info(f" Accuracy: {accuracy:.2%}")
logger.info(f" Format rate: {format_rate:.2%}")
return [("eval/random_tasks_accuracy", accuracy)]
env.evaluate = quick_evaluate
await env.evaluate()
# Test 4: Test specific bootcamp fallback
logger.info("\n========== Test 4: Specific Bootcamp Test ==========")
# Test with a specific bootcamp to ensure single-task mode still works
specific_config = InternBootcampEnvConfig(
**env_config.model_dump(),
task_name="Game24bootcamp",
task_params={
"num_numbers": 4,
"range_max": 20,
"target_max": 30,
},
)
try:
specific_env = InternBootcampEnv(
config=specific_config,
server_configs=server_configs,
slurm=False,
testing=True,
)
await specific_env.setup()
item = await specific_env.get_next_item()
_, metadata = item
logger.info("Specific bootcamp test (Game24bootcamp):")
logger.info(f" Task: {metadata['task_name']}")
logger.info(f" Problem: {metadata['identity']}")
logger.info(f" Prompt preview: {metadata['raw_prompt'][:100]}...")
except Exception as e:
logger.error(f"Failed to test specific bootcamp: {e}")
# Test 5: Show bootcamp registry info
logger.info("\n========== Test 5: Bootcamp Registry Info ==========")
from environments.intern_bootcamp.bootcamp_registry import (
get_available_bootcamps,
)
available = get_available_bootcamps()
logger.info(f"Total available bootcamps: {len(available)}")
logger.info(f"Sample bootcamps: {available[:10]}")
# Show some variety in bootcamp names
math_bootcamps = [
name
for name in available
if any(x in name.lower() for x in ["math", "game", "number"])
]
logic_bootcamps = [
name
for name in available
if any(x in name.lower() for x in ["logic", "puzzle", "cipher"])
]
logger.info(f"Math-related bootcamps (sample): {math_bootcamps[:5]}")
logger.info(f"Logic-related bootcamps (sample): {logic_bootcamps[:5]}")
logger.info("\n========== All Tests Complete ==========")
logger.info("RandomTask multitask curriculum is working correctly!")
except Exception as e:
logger.exception(f"An error occurred during testing: {e}")
if __name__ == "__main__":
asyncio.run(main())

@ -0,0 +1 @@
Subproject commit 7b218f8e38c148d1aa87f5d92ba4b7e137946fb8

View file

@ -0,0 +1,22 @@
#!/usr/bin/env python3
"""
Standalone entry point for InternBootcamp environment.
This script avoids relative import issues when running directly.
"""
import os
import sys
# Add the atropos root directory to Python path
atropos_root = os.path.dirname(
os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
)
sys.path.insert(0, atropos_root)
# Now import with absolute imports
from environments.intern_bootcamp.intern_bootcamp_env import ( # noqa: E402
InternBootcampEnv,
)
if __name__ == "__main__":
InternBootcampEnv.cli()