4.8 KiB
Dataset Environment Local Testing Guide
This document explains how to run the Dataset Environment locally for testing purposes.
Prerequisites
- Make sure you have the repository cloned and dependencies installed
- Ensure you have a compatible model available (local or API)
Option 1: Single Script End-to-End Execution
The easiest way to test the Dataset Environment is to use the unified launcher script:
python -m environments.dataset_environment.launch_local_dataset_run
This script:
- Starts the Trajectory Handler API server via uvicorn
- Launches the Dataset Environment in serve mode (connected to the API)
- Runs the example GRPO trainer directly
The script has environment defaults configured for:
- Using a small LLM (Qwen2.5-1.5B) running on localhost:9001
- A test subset of a public HF dataset
- Basic length-based rewards
Option 2: Step-by-step Manual Testing
1. Start the API Server
uvicorn atroposlib.api.server:app --host 127.0.0.1 --port 8000
2. Launch the Environment
python -m environments.dataset_environment.dataset_env serve \
--group_size 4 \
--max_num_workers 2 \
--rollout_server_url http://127.0.0.1:8000 \
--tokenizer_name Qwen/Qwen2.5-1.5B-Instruct \
--use_wandb --wandb_name dataset_env_local_test \
--max_token_length 512 \
--ensure_scores_are_not_same \
--dataset_name HuggingFaceH4/testing_self_instruct_process_essays \
--split train[:100] \
--prompt_field prompt --answer_field answer \
--reward_functions length \
--max_tokens 128 --temperature 0.7 \
--model_name Qwen/Qwen2.5-1.5B-Instruct \
--base_url http://127.0.0.1:9001 \
--slurm --testing
3. Launch the Trainer
In a separate terminal:
python -m example_trainer.grpo.train \
--model_name Qwen/Qwen2.5-1.5B-Instruct \
--training_steps 20 \
--batch_size 2 \
--gradient_accumulation_steps 2 \
--seq_len 512
Option N: Use the Dataset Local Server
For easier configuration via YAML files, you can use the local server script:
# This command will look for environments/dataset_environment/configs/gsm8k.yaml
python environments/dataset_environment/dataset_local_server.py --config gsm8k
# You can also provide a full path:
# python environments/dataset_environment/dataset_local_server.py --config /path/to/your/custom_config.yaml
This will load the specified config and run the environment accordingly.
Debugging
To check if requests are properly sent to and received by the API server, you can inspect the logs from both the environment and the API server. Look for:
- API logs showing incoming requests
- Environment logs showing completions being generated and scored
For model-specific issues, check:
- Ensure your model server is running at the specified URL
- Check model server logs for any errors related to generation
Configuration Structure
Configuration files placed in environments/dataset_environment/configs/ typically contain:
# Example: environments/dataset_environment/configs/my_config.yaml
# Base environment parameters (can be overridden by dataset specifics)
tokenizer_name: "NousResearch/DeepHermes-3-Llama-3-8B-Preview"
group_size: 1
use_wandb: false
# ... other base parameters
# Dataset specific configuration
dataset:
# Dataset parameters
dataset_name: "databricks/databricks-dolly-15k"
prompt_field: "instruction"
# ... other dataset parameters
reward_functions:
- type: "accuracy"
weight: 1.0
- type: "repetition_penalty"
weight: 0.2
# Optional Server configuration (if not using CLI flags in dataset_env)
server_configs:
- model_name: "gpt-4.1-nano"
api_key: ${OPENAI_API_KEY}
timeout: 600
Important Configuration Parameters
Base Parameters
tokenizer_name: The tokenizer to use for encoding/decoding textgroup_size: Number of responses to collect per promptmax_token_length: Maximum token length for generationsteps_per_eval: How often to run evaluations
Dataset Specific Parameters (dataset: section)
dataset_name: HuggingFace dataset name (required)dataset_config: Dataset configuration name (optional)prompt_field: Field in dataset to use as prompt (required)answer_field: Field in dataset to use as answer (optional)system_prompt: System prompt to use (optional)reward_functions: List of reward functions to apply (optional)
Server Configuration (server_configs: section, optional in local server)
model_name: LLM model to useapi_key: API key for the model (can use environment variables with ${VAR_NAME} syntax)timeout: Request timeout in seconds
Troubleshooting
If you encounter issues with reward functions, make sure they are properly registered in the registry.
For dataset-related issues, verify that the dataset exists on HuggingFace and that the specified fields exist in the dataset.