mirror of
https://github.com/NousResearch/atropos.git
synced 2026-04-19 12:57:58 +00:00
155 lines
4.8 KiB
Markdown
155 lines
4.8 KiB
Markdown
# Dataset Environment Local Testing Guide
|
|
|
|
This document explains how to run the Dataset Environment locally for testing purposes.
|
|
|
|
## Prerequisites
|
|
|
|
1. Make sure you have the repository cloned and dependencies installed
|
|
2. Ensure you have a compatible model available (local or API)
|
|
|
|
## Option 1: Single Script End-to-End Execution
|
|
|
|
The easiest way to test the Dataset Environment is to use the unified launcher script:
|
|
|
|
```bash
|
|
python -m environments.dataset_environment.launch_local_dataset_run
|
|
```
|
|
|
|
This script:
|
|
1. Starts the Trajectory Handler API server via uvicorn
|
|
2. Launches the Dataset Environment in serve mode (connected to the API)
|
|
3. Runs the example GRPO trainer directly
|
|
|
|
The script has environment defaults configured for:
|
|
- Using a small LLM (Qwen2.5-1.5B) running on localhost:9001
|
|
- A test subset of a public HF dataset
|
|
- Basic length-based rewards
|
|
|
|
## Option 2: Step-by-step Manual Testing
|
|
|
|
### 1. Start the API Server
|
|
|
|
```bash
|
|
uvicorn atroposlib.api.server:app --host 127.0.0.1 --port 8000
|
|
```
|
|
|
|
### 2. Launch the Environment
|
|
|
|
```bash
|
|
python -m environments.dataset_environment.dataset_env serve \
|
|
--group_size 4 \
|
|
--max_num_workers 2 \
|
|
--rollout_server_url http://127.0.0.1:8000 \
|
|
--tokenizer_name Qwen/Qwen2.5-1.5B-Instruct \
|
|
--use_wandb --wandb_name dataset_env_local_test \
|
|
--max_token_length 512 \
|
|
--ensure_scores_are_not_same \
|
|
--dataset_name HuggingFaceH4/testing_self_instruct_process_essays \
|
|
--split train[:100] \
|
|
--prompt_field prompt --answer_field answer \
|
|
--reward_functions length \
|
|
--max_tokens 128 --temperature 0.7 \
|
|
--model_name Qwen/Qwen2.5-1.5B-Instruct \
|
|
--base_url http://127.0.0.1:9001 \
|
|
--slurm --testing
|
|
```
|
|
|
|
### 3. Launch the Trainer
|
|
|
|
In a separate terminal:
|
|
|
|
```bash
|
|
python -m example_trainer.grpo.train \
|
|
--model_name Qwen/Qwen2.5-1.5B-Instruct \
|
|
--training_steps 20 \
|
|
--batch_size 2 \
|
|
--gradient_accumulation_steps 2 \
|
|
--seq_len 512
|
|
```
|
|
|
|
## Option N: Use the Dataset Local Server
|
|
|
|
For easier configuration via YAML files, you can use the local server script:
|
|
|
|
```bash
|
|
# This command will look for environments/dataset_environment/configs/gsm8k.yaml
|
|
python environments/dataset_environment/dataset_local_server.py --config gsm8k
|
|
|
|
# You can also provide a full path:
|
|
# python environments/dataset_environment/dataset_local_server.py --config /path/to/your/custom_config.yaml
|
|
```
|
|
|
|
This will load the specified config and run the environment accordingly.
|
|
|
|
## Debugging
|
|
|
|
To check if requests are properly sent to and received by the API server, you can inspect the logs from both the environment and the API server. Look for:
|
|
|
|
- API logs showing incoming requests
|
|
- Environment logs showing completions being generated and scored
|
|
|
|
For model-specific issues, check:
|
|
- Ensure your model server is running at the specified URL
|
|
- Check model server logs for any errors related to generation
|
|
|
|
## Configuration Structure
|
|
|
|
Configuration files placed in `environments/dataset_environment/configs/` typically contain:
|
|
|
|
```yaml
|
|
# Example: environments/dataset_environment/configs/my_config.yaml
|
|
|
|
# Base environment parameters (can be overridden by dataset specifics)
|
|
tokenizer_name: "NousResearch/DeepHermes-3-Llama-3-8B-Preview"
|
|
group_size: 1
|
|
use_wandb: false
|
|
# ... other base parameters
|
|
|
|
# Dataset specific configuration
|
|
dataset:
|
|
# Dataset parameters
|
|
dataset_name: "databricks/databricks-dolly-15k"
|
|
prompt_field: "instruction"
|
|
# ... other dataset parameters
|
|
reward_functions:
|
|
- type: "accuracy"
|
|
weight: 1.0
|
|
- type: "repetition_penalty"
|
|
weight: 0.2
|
|
|
|
# Optional Server configuration (if not using CLI flags in dataset_env)
|
|
server_configs:
|
|
- model_name: "gpt-4.1-nano"
|
|
api_key: ${OPENAI_API_KEY}
|
|
timeout: 600
|
|
```
|
|
|
|
### Important Configuration Parameters
|
|
|
|
#### Base Parameters
|
|
|
|
- `tokenizer_name`: The tokenizer to use for encoding/decoding text
|
|
- `group_size`: Number of responses to collect per prompt
|
|
- `max_token_length`: Maximum token length for generation
|
|
- `steps_per_eval`: How often to run evaluations
|
|
|
|
#### Dataset Specific Parameters (`dataset:` section)
|
|
|
|
- `dataset_name`: HuggingFace dataset name (required)
|
|
- `dataset_config`: Dataset configuration name (optional)
|
|
- `prompt_field`: Field in dataset to use as prompt (required)
|
|
- `answer_field`: Field in dataset to use as answer (optional)
|
|
- `system_prompt`: System prompt to use (optional)
|
|
- `reward_functions`: List of reward functions to apply (optional)
|
|
|
|
#### Server Configuration (`server_configs:` section, optional in local server)
|
|
|
|
- `model_name`: LLM model to use
|
|
- `api_key`: API key for the model (can use environment variables with ${VAR_NAME} syntax)
|
|
- `timeout`: Request timeout in seconds
|
|
|
|
## Troubleshooting
|
|
|
|
If you encounter issues with reward functions, make sure they are properly registered in the registry.
|
|
|
|
For dataset-related issues, verify that the dataset exists on HuggingFace and that the specified fields exist in the dataset.
|