atropos/environments/dataset_environment/LOCAL_TESTING.md
2025-05-09 09:54:20 -05:00

4.8 KiB

Dataset Environment Local Testing Guide

This document explains how to run the Dataset Environment locally for testing purposes.

Prerequisites

  1. Make sure you have the repository cloned and dependencies installed
  2. Ensure you have a compatible model available (local or API)

Option 1: Single Script End-to-End Execution

The easiest way to test the Dataset Environment is to use the unified launcher script:

python -m environments.dataset_environment.launch_local_dataset_run

This script:

  1. Starts the Trajectory Handler API server via uvicorn
  2. Launches the Dataset Environment in serve mode (connected to the API)
  3. Runs the example GRPO trainer directly

The script has environment defaults configured for:

  • Using a small LLM (Qwen2.5-1.5B) running on localhost:9001
  • A test subset of a public HF dataset
  • Basic length-based rewards

Option 2: Step-by-step Manual Testing

1. Start the API Server

uvicorn atroposlib.api.server:app --host 127.0.0.1 --port 8000

2. Launch the Environment

python -m environments.dataset_environment.dataset_env serve \
  --group_size 4 \
  --max_num_workers 2 \
  --rollout_server_url http://127.0.0.1:8000 \
  --tokenizer_name Qwen/Qwen2.5-1.5B-Instruct \
  --use_wandb --wandb_name dataset_env_local_test \
  --max_token_length 512 \
  --ensure_scores_are_not_same \
  --dataset_name HuggingFaceH4/testing_self_instruct_process_essays \
  --split train[:100] \
  --prompt_field prompt --answer_field answer \
  --reward_functions length \
  --max_tokens 128 --temperature 0.7 \
  --model_name Qwen/Qwen2.5-1.5B-Instruct \
  --base_url http://127.0.0.1:9001 \
  --slurm --testing

3. Launch the Trainer

In a separate terminal:

python -m example_trainer.grpo.train \
  --model_name Qwen/Qwen2.5-1.5B-Instruct \
  --training_steps 20 \
  --batch_size 2 \
  --gradient_accumulation_steps 2 \
  --seq_len 512

Option N: Use the Dataset Local Server

For easier configuration via YAML files, you can use the local server script:

# This command will look for environments/dataset_environment/configs/gsm8k.yaml
python environments/dataset_environment/dataset_local_server.py --config gsm8k

# You can also provide a full path:
# python environments/dataset_environment/dataset_local_server.py --config /path/to/your/custom_config.yaml

This will load the specified config and run the environment accordingly.

Debugging

To check if requests are properly sent to and received by the API server, you can inspect the logs from both the environment and the API server. Look for:

  • API logs showing incoming requests
  • Environment logs showing completions being generated and scored

For model-specific issues, check:

  • Ensure your model server is running at the specified URL
  • Check model server logs for any errors related to generation

Configuration Structure

Configuration files placed in environments/dataset_environment/configs/ typically contain:

# Example: environments/dataset_environment/configs/my_config.yaml

# Base environment parameters (can be overridden by dataset specifics)
tokenizer_name: "NousResearch/DeepHermes-3-Llama-3-8B-Preview"
group_size: 1
use_wandb: false
# ... other base parameters

# Dataset specific configuration
dataset:
  # Dataset parameters
  dataset_name: "databricks/databricks-dolly-15k"
  prompt_field: "instruction"
  # ... other dataset parameters
  reward_functions:
    - type: "accuracy"
      weight: 1.0
    - type: "repetition_penalty"
      weight: 0.2

# Optional Server configuration (if not using CLI flags in dataset_env)
server_configs:
  - model_name: "gpt-4.1-nano"
    api_key: ${OPENAI_API_KEY}
    timeout: 600

Important Configuration Parameters

Base Parameters

  • tokenizer_name: The tokenizer to use for encoding/decoding text
  • group_size: Number of responses to collect per prompt
  • max_token_length: Maximum token length for generation
  • steps_per_eval: How often to run evaluations

Dataset Specific Parameters (dataset: section)

  • dataset_name: HuggingFace dataset name (required)
  • dataset_config: Dataset configuration name (optional)
  • prompt_field: Field in dataset to use as prompt (required)
  • answer_field: Field in dataset to use as answer (optional)
  • system_prompt: System prompt to use (optional)
  • reward_functions: List of reward functions to apply (optional)

Server Configuration (server_configs: section, optional in local server)

  • model_name: LLM model to use
  • api_key: API key for the model (can use environment variables with ${VAR_NAME} syntax)
  • timeout: Request timeout in seconds

Troubleshooting

If you encounter issues with reward functions, make sure they are properly registered in the registry.

For dataset-related issues, verify that the dataset exists on HuggingFace and that the specified fields exist in the dataset.