9.9 KiB
Quick Start
Option A: Unified End-to-End Launcher
python -m environments.dataset_environment.launch_local_dataset_run
This single command spins up:
- The Trajectory Handler API server (
uvicorn atroposlib.api.server:app) - The DatasetEnv in serve mode (connected to the API)
- The example GRPO trainer (via
example_trainer.grpo.train)
Option B: Manual Steps
-
Start the API server
uvicorn atroposlib.api.server:app --host 127.0.0.1 --port 8000 -
Launch the Dataset Environment
-
Using CLI flags: (These flags override any config file settings)
python -m environments.dataset_environment.dataset_env serve \ --group_size 4 \ --max_num_workers 2 \ --rollout_server_url http://127.0.0.1:8000 \ --tokenizer_name Qwen/Qwen2.5-1.5B-Instruct \ --use_wandb --wandb_name dataset_env_local_test \ --max_token_length 512 \ --ensure_scores_are_not_same \ --dataset_name HuggingFaceH4/testing_self_instruct_process_essays \ --split train[:100] \ --prompt_field prompt --answer_field answer \ --reward_functions length \ --max_tokens 128 --temperature 0.7 \ --model_name Qwen/Qwen2.5-1.5B-Instruct \ --base_url http://127.0.0.1:9001 \ --slurm --testing -
Using YAML config files:
Place a dataset config under
environments/dataset_environment/configs/<name>.yaml:# Example: environments/dataset_environment/configs/gsm8k.yaml dataset: dataset_name: "gsm8k" dataset_config: "main" split: "train" prompt_field: "question" answer_field: "answer" system_prompt: "You are a mathematical problem solver..." generation: temperature: 0.7 top_p: 0.95 reward_functions: - type: "accuracy" weight: 1.0Then run the local test server:
# Will look for environments/dataset_environment/configs/gsm8k.yaml python environments/dataset_environment/dataset_local_server.py --config gsm8k
-
-
Launch the Trainer
python -m example_trainer.grpo
Configuration Files Directory
Dataset environment specific configurations now live in environments/dataset_environment/configs/.
Shared configurations (like agents) might still reside in the project's root configs/ directory.
environments/dataset_environment/configs/for dataset-specific configs (used bydataset_local_server.py).- You can reference any
<name>.yamlwithin this directory via the--configflag in the local server script.
Reward Function Registry & Customization
Reward functions are managed by a centralized registry (see atroposlib/envs/reward_fns/reward_function.py). Built-in types include:
accuracy: exact match to ground truth (tolerance, split_on_think_tag)format: checks for specific tags (preferred_tags)reasoning_steps: quality of step-by-step reasoningrepetition_penalty: penalizes repetitioncosine_scaled: semantic similarity scaled from embeddingscrossword_format: crossword-specific penaltyr1: combined accuracy + format
To preview all available functions:
from atroposlib.envs.reward_fns import registry
print(registry.list())
Creating Custom Reward Functions
-
Create a new file under
atroposlib/envs/reward_fns/my_reward.py. -
Subclass
RewardFunctionand register it:from atroposlib.envs.reward_fns import registry, RewardFunction @registry.register class MyCustomReward(RewardFunction): def __init__(self, custom_param=1.0, weight=1.0, **kwargs): super().__init__(weight=weight, **kwargs) self.custom_param = custom_param def compute(self, completions, **kwargs): return [1.0 if "good answer" in self.get_content(c) else 0.0 for c in completions] -
Reference it in your YAML config:
reward_functions: - type: "my_custom" weight: 1.0 params: custom_param: 2.0
Dataset Environments
Dataset environments load data from HuggingFace datasets and evaluate LLM responses against ground truth. They're ideal for academic benchmarks and datasets with clear evaluation criteria.
Example configuration:
dataset:
dataset_name: "gsm8k"
dataset_config: "main"
split: "train"
prompt_field: "question"
answer_field: "answer"
system_prompt: "You are a mathematical problem solver..."
reward_functions:
- type: "accuracy"
weight: 1.0
Reward Functions
The system features a flexible reward function architecture for evaluating model outputs.
Basic Usage
In your environment config, specify reward functions:
reward_functions:
- type: "accuracy"
weight: 1.0
- type: "format"
weight: 0.5
Combining Reward Functions
Combine multiple reward functions with weights:
reward_functions:
- type: "combined"
params:
normalization: "sum"
rewards:
- type: "accuracy"
weight: 1.5
- type: "format"
weight: 0.5
Available Reward Functions
accuracy
Evaluates if completions match ground truth answers.
type: "accuracy"
weight: 1.0
params:
tolerance: 1e-6
split_on_think_tag: true
max_boxed_threshold: 6
format
Checks if completions include specific XML-style tags.
type: "format"
weight: 1.0
params:
preferred_tags: ["think", "reasoning"]
require_all_tags: false
case_sensitive: false
reasoning_steps
Evaluates step-by-step reasoning quality.
type: "reasoning_steps"
weight: 1.0
params:
min_words: 10
min_steps: 3
base_score: 0.1
repetition_penalty
Penalizes repetitive content.
type: "repetition_penalty"
weight: 0.5
params:
threshold: 0.05
min_words: 10
min_sentences: 2
cosine_scaled
Measures semantic similarity between completions and solutions.
type: "cosine_scaled"
weight: 0.8
params:
model_name: "sentence-transformers/all-MiniLM-L6-v2"
scale_factor: 1.0
min_reward: -1.0
max_reward: 1.0
crossword_format
Game-specific reward for crossword puzzles.
type: "crossword_format"
weight: 1.0
params:
reward_value: 1.0
penalize_invalid_chars: true
r1
Combined reward using both reasoning format and accuracy.
type: "r1"
weight: 1.0
params:
format_weight: 0.5
accuracy_weight: 1.0
Creating Custom Reward Functions
To create a custom reward function:
-
Create a new file in
atroposlib/envs/reward_fns/my_reward.py -
Define your reward function class:
from typing import Any, List
from atroposlib.envs.reward_fns import registry, RewardFunction
@registry.register
class MyCustomReward(RewardFunction):
def __init__(self, custom_param=1.0, weight=1.0, **kwargs):
super().__init__(weight=weight, **kwargs)
self.custom_param = custom_param
def compute(self, completions: List[Any], **kwargs) -> List[float]:
rewards = []
for completion in completions:
content = self.get_content(completion)
# Implement your reward logic
reward = 1.0 if "good answer" in content else 0.0
rewards.append(reward)
return rewards
- Use it in your config:
reward_functions:
- type: "my_custom"
weight: 1.0
params:
custom_param: 2.0
Dataset Environment Debugger
The dataset environment debugger allows you to run a dataset environment locally with a Hugging Face model, providing enhanced visibility into reward function performance and model responses.
# Run with default settings
python -m atroposlib.cli.dataset_env_debugger --env gsm8k_debug --agent nous_hermes_8b
# List available environments and agents
python -m atroposlib.cli.dataset_env_debugger --list-configs
# Interactive mode with debugging information
python -m atroposlib.cli.dataset_env_debugger --env gsm8k_debug --agent nous_hermes_8b --interactive --debug
# Run with custom generation parameters
python -m atroposlib.cli.dataset_env_debugger --env gsm8k_debug --agent nous_hermes_8b --temperature 0.5 --top-p 0.95
# Run with detailed logging
python -m atroposlib.cli.dataset_env_debugger --env gsm8k_debug --agent nous_hermes_8b --verbose
Environment Overview
This environment demonstrates how to use a standard dataset (e.g., from Hugging Face Datasets) as a source for generating prompts and evaluating LLM responses. It allows for testing and training models on established benchmarks or custom datasets where prompts and expected answers/ground truth are available.
Demonstrates:
- Loading and processing data from Hugging Face Datasets.
- Configuring system prompts, prompt/answer fields.
- Applying various reward functions (accuracy, format, semantic similarity, etc.) to evaluate generations.
- Integrating with the
atroposlibframework for data collection and scoring.
Training Goal:
- To train LLMs to follow instructions and generate responses that align with the format and content specified by the dataset and reward functions.
- To improve performance on specific tasks defined by datasets (e.g., math problem solving, code generation, question answering).
Local Testing
To test this environment locally, you can run the provided local server. This server simulates the interaction flow without needing the full distributed setup.
First, ensure you have the necessary dependencies installed.
Then, run the local server script from the root of the repository:
python environments/dataset_environment/dataset_local_server.py --config-path path/to/your/dataset_config.yaml
Replace path/to/your/dataset_config.yaml with the actual path to your environment configuration file (e.g., configs/envs/gsm8k.yaml). The server will load the dataset specified in the config, process items, and simulate generating responses.
FOR RELEASE - FIX