mirror of
https://github.com/NousResearch/atropos.git
synced 2026-04-28 17:29:30 +00:00
first commit
This commit is contained in:
commit
621d00dd80
89 changed files with 15315 additions and 0 deletions
155
environments/dataset_environment/LOCAL_TESTING.md
Normal file
155
environments/dataset_environment/LOCAL_TESTING.md
Normal file
|
|
@ -0,0 +1,155 @@
|
|||
# Dataset Environment Local Testing Guide
|
||||
|
||||
This document explains how to run the Dataset Environment locally for testing purposes.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. Make sure you have the repository cloned and dependencies installed
|
||||
2. Ensure you have a compatible model available (local or API)
|
||||
|
||||
## Option 1: Single Script End-to-End Execution
|
||||
|
||||
The easiest way to test the Dataset Environment is to use the unified launcher script:
|
||||
|
||||
```bash
|
||||
python -m environments.dataset_environment.launch_local_dataset_run
|
||||
```
|
||||
|
||||
This script:
|
||||
1. Starts the Trajectory Handler API server via uvicorn
|
||||
2. Launches the Dataset Environment in serve mode (connected to the API)
|
||||
3. Runs the example GRPO trainer directly
|
||||
|
||||
The script has environment defaults configured for:
|
||||
- Using a small LLM (Qwen2.5-1.5B) running on localhost:9001
|
||||
- A test subset of a public HF dataset
|
||||
- Basic length-based rewards
|
||||
|
||||
## Option 2: Step-by-step Manual Testing
|
||||
|
||||
### 1. Start the API Server
|
||||
|
||||
```bash
|
||||
uvicorn atroposlib.api.server:app --host 127.0.0.1 --port 8000
|
||||
```
|
||||
|
||||
### 2. Launch the Environment
|
||||
|
||||
```bash
|
||||
python -m environments.dataset_environment.dataset_env serve \
|
||||
--group_size 4 \
|
||||
--max_num_workers 2 \
|
||||
--rollout_server_url http://127.0.0.1:8000 \
|
||||
--tokenizer_name Qwen/Qwen2.5-1.5B-Instruct \
|
||||
--use_wandb --wandb_name dataset_env_local_test \
|
||||
--max_token_length 512 \
|
||||
--ensure_scores_are_not_same \
|
||||
--dataset_name HuggingFaceH4/testing_self_instruct_process_essays \
|
||||
--split train[:100] \
|
||||
--prompt_field prompt --answer_field answer \
|
||||
--reward_functions length \
|
||||
--max_tokens 128 --temperature 0.7 \
|
||||
--model_name Qwen/Qwen2.5-1.5B-Instruct \
|
||||
--base_url http://127.0.0.1:9001 \
|
||||
--slurm --testing
|
||||
```
|
||||
|
||||
### 3. Launch the Trainer
|
||||
|
||||
In a separate terminal:
|
||||
|
||||
```bash
|
||||
python -m example_trainer.grpo.train \
|
||||
--model_name Qwen/Qwen2.5-1.5B-Instruct \
|
||||
--training_steps 20 \
|
||||
--batch_size 2 \
|
||||
--gradient_accumulation_steps 2 \
|
||||
--seq_len 512
|
||||
```
|
||||
|
||||
## Option N: Use the Dataset Local Server
|
||||
|
||||
For easier configuration via YAML files, you can use the local server script:
|
||||
|
||||
```bash
|
||||
# This command will look for environments/dataset_environment/configs/gsm8k.yaml
|
||||
python environments/dataset_environment/dataset_local_server.py --config gsm8k
|
||||
|
||||
# You can also provide a full path:
|
||||
# python environments/dataset_environment/dataset_local_server.py --config /path/to/your/custom_config.yaml
|
||||
```
|
||||
|
||||
This will load the specified config and run the environment accordingly.
|
||||
|
||||
## Debugging
|
||||
|
||||
To check if requests are properly sent to and received by the API server, you can inspect the logs from both the environment and the API server. Look for:
|
||||
|
||||
- API logs showing incoming requests
|
||||
- Environment logs showing completions being generated and scored
|
||||
|
||||
For model-specific issues, check:
|
||||
- Ensure your model server is running at the specified URL
|
||||
- Check model server logs for any errors related to generation
|
||||
|
||||
## Configuration Structure
|
||||
|
||||
Configuration files placed in `environments/dataset_environment/configs/` typically contain:
|
||||
|
||||
```yaml
|
||||
# Example: environments/dataset_environment/configs/my_config.yaml
|
||||
|
||||
# Base environment parameters (can be overridden by dataset specifics)
|
||||
tokenizer_name: "NousResearch/DeepHermes-3-Llama-3-8B-Preview"
|
||||
group_size: 1
|
||||
use_wandb: false
|
||||
# ... other base parameters
|
||||
|
||||
# Dataset specific configuration
|
||||
dataset:
|
||||
# Dataset parameters
|
||||
dataset_name: "databricks/databricks-dolly-15k"
|
||||
prompt_field: "instruction"
|
||||
# ... other dataset parameters
|
||||
reward_functions:
|
||||
- type: "accuracy"
|
||||
weight: 1.0
|
||||
- type: "repetition_penalty"
|
||||
weight: 0.2
|
||||
|
||||
# Optional Server configuration (if not using CLI flags in dataset_env)
|
||||
server_configs:
|
||||
- model_name: "gpt-4.1-nano"
|
||||
api_key: ${OPENAI_API_KEY}
|
||||
timeout: 600
|
||||
```
|
||||
|
||||
### Important Configuration Parameters
|
||||
|
||||
#### Base Parameters
|
||||
|
||||
- `tokenizer_name`: The tokenizer to use for encoding/decoding text
|
||||
- `group_size`: Number of responses to collect per prompt
|
||||
- `max_token_length`: Maximum token length for generation
|
||||
- `steps_per_eval`: How often to run evaluations
|
||||
|
||||
#### Dataset Specific Parameters (`dataset:` section)
|
||||
|
||||
- `dataset_name`: HuggingFace dataset name (required)
|
||||
- `dataset_config`: Dataset configuration name (optional)
|
||||
- `prompt_field`: Field in dataset to use as prompt (required)
|
||||
- `answer_field`: Field in dataset to use as answer (optional)
|
||||
- `system_prompt`: System prompt to use (optional)
|
||||
- `reward_functions`: List of reward functions to apply (optional)
|
||||
|
||||
#### Server Configuration (`server_configs:` section, optional in local server)
|
||||
|
||||
- `model_name`: LLM model to use
|
||||
- `api_key`: API key for the model (can use environment variables with ${VAR_NAME} syntax)
|
||||
- `timeout`: Request timeout in seconds
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
If you encounter issues with reward functions, make sure they are properly registered in the registry.
|
||||
|
||||
For dataset-related issues, verify that the dataset exists on HuggingFace and that the specified fields exist in the dataset.
|
||||
Loading…
Add table
Add a link
Reference in a new issue