remove reqs and update community readme

This commit is contained in:
Jai Suphavadeeprasit 2026-02-26 10:02:26 -05:00
parent fb3228f669
commit dbf6026165
5 changed files with 116 additions and 12 deletions

View file

@ -1,5 +1,108 @@
# GRPO Example Trainer # GRPO Example Trainer
This guide explains how to run the `example_trainer` integration with Atropos using GRPO.
The trainer is a reference implementation for end-to-end wiring (`environment -> run-api -> rollout server -> optimizer`), with multiple synchronization modes with vLLM.
## Supported Modes
- `shared_vllm`: single-copy training via CUDA IPC (trainer updates shared vLLM tensors in place)
- `lora_only`: LoRA adapter training with HTTP hot-swap (slow due to eager mode)
- `lora_restart`: LoRA adapter training with periodic vLLM restart (faster than `lora_only`)
- `none`: legacy full-checkpoint flow with vLLM reloads
## Prerequisites
1. Python 3.10+
2. CUDA-capable PyTorch environment for GPU training
3. Atropos API server available (`run-api`)
4. An environment process producing trajectories (for example GSM8K server)
## Installation
From repository root:
```bash
pip install -e ".[example_trainer]"
```
Optional (all extras):
```bash
pip install -e ".[all]"
```
## CLI Entry Points
After install, you can use either module invocation or script entrypoints:
- `python -m example_trainer.grpo` or `atropos-grpo`
- `python -m example_trainer.run` or `atropos-grpo-run`
## Minimal End-to-End Startup
### 1) Start Atropos API
```bash
run-api --port 8002
```
### 2) Start an environment
```bash
python environments/gsm8k_server.py serve \
--env.rollout_server_url "http://localhost:8002" \
--openai.server_type vllm \
--openai.base_url "http://localhost:9001/v1" \
--openai.api_key "dummy"
```
### 3) Start vLLM server (shared-weights example)
```bash
VLLM_ENABLE_SHARED_WEIGHTS=1 LOGDIR=/tmp/grpo_training \
python -m example_trainer.vllm_api_server \
--model Qwen/Qwen3-1.7B-Base \
--port 9001 \
--gpu-memory-utilization 0.45 \
--enforce-eager
```
### 4) Start trainer
```bash
atropos-grpo \
--model-name Qwen/Qwen3-1.7B-Base \
--weight-bridge-mode shared_vllm \
--vllm-port 9001 \
--vllm-config-path /tmp/grpo_training/vllm_bridge_config.json \
--atropos-url "http://localhost:8002" \
--batch-size 1 \
--gradient-accumulation-steps 64 \
--warmup-steps 5 \
--training-steps 30 \
--kl-coef 0.0 \
--clip-eps 0.2
```
## Objective Notes
- GRPO uses rollout/inference logprobs (`pi_old`) for importance-ratio computation.
- The optional KL-like term is sampled-token regularization against rollout policy logprobs, not a separate frozen-reference-model KL.
## Outputs
- Trainer logs to stdout (and optional W&B if enabled)
- Checkpoints under `--save-path`
- Mode-specific logs/checkpoints when using matrix/orchestration scripts
## Troubleshooting
- If vLLM health checks time out, inspect `vllm.log`, `trainer.log`, and `env.log`.
- If targeted shared-layer runs lose gradients, ensure non-reentrant checkpointing is enabled in shared mode.
- If environment workers time out at 600s, reduce env concurrency (`--env.max_num_workers_per_node`) and batch pressure.
# GRPO Example Trainer
This directory contains an example script (`grpo.py`) demonstrating how to integrate a custom training loop with the Atropos API for reinforcement learning using the GRPO (Group Relative Policy Optimization) algorithm. This directory contains an example script (`grpo.py`) demonstrating how to integrate a custom training loop with the Atropos API for reinforcement learning using the GRPO (Group Relative Policy Optimization) algorithm.
**Note: Example trainer does not support multimodal training out of the box. As other trainers add support for Atropos, we will list them in the main readme, some of which may support multimodal RL - please check the main repo readme for any updates.** **Note: Example trainer does not support multimodal training out of the box. As other trainers add support for Atropos, we will list them in the main readme, some of which may support multimodal RL - please check the main repo readme for any updates.**
@ -68,7 +171,7 @@ Once the prerequisites are met and configuration is set:
```bash ```bash
# Install dependencies # Install dependencies
pip install -r example_trainer/requirements.txt pip install -e ".[example_trainer]"
# Run the trainer directly (basic test) # Run the trainer directly (basic test)
python example_trainer/grpo.py python example_trainer/grpo.py

View file

@ -107,7 +107,9 @@ The `lora_only` mode requires `--enforce-eager` which **disables CUDA graphs**,
## Quick Start: LoRA Training (Recommended) ## Quick Start: LoRA Training (Recommended)
### Step 1: Install Dependencies ### Step 1: Install Dependencies
- They are listed in the requirements.txt file that you can see - Install from `pyproject.toml` extras:
- `pip install -e ".[example_trainer]"`
- or everything: `pip install -e ".[all]"`
### Step 2: Start All Components ### Step 2: Start All Components

View file

@ -1,8 +0,0 @@
vllm
torch
transformers
datasets
accelerate
peft
requests
wandb

View file

@ -299,7 +299,7 @@ The `example_trainer/` directory provides `grpo.py`, a script demonstrating inte
1. Python 3.8+ (Python 3.10+ recommended for Atropos overall). 1. Python 3.8+ (Python 3.10+ recommended for Atropos overall).
2. Running Atropos API server (default: `http://localhost:8000`). Accessible via `run-api`. 2. Running Atropos API server (default: `http://localhost:8000`). Accessible via `run-api`.
3. Required Python packages: `torch`, `transformers`, `vllm`, `pydantic`, `numpy`, `requests`, `tenacity`, `wandb` (optional). Install via `pip install -r example_trainer/requirements.txt` or `pip install -e .[examples]`. 3. Required Python packages: `torch`, `transformers`, `vllm`, `pydantic`, `numpy`, `requests`, `tenacity`, `wandb` (optional). Install via `pip install -e ".[example_trainer]"` (or `pip install -e ".[all]"`).
4. A running Atropos environment (e.g., `python environments/gsm8k_server.py serve --slurm False`). 4. A running Atropos environment (e.g., `python environments/gsm8k_server.py serve --slurm False`).
### 9.2. Setup ### 9.2. Setup

View file

@ -40,11 +40,18 @@ atropos-grpo-run = "example_trainer.run:main"
[project.optional-dependencies] [project.optional-dependencies]
all = [ all = [
"atroposlib[dev,examples]" "atroposlib[dev,examples,example_trainer]"
] ]
rewardfns = [ rewardfns = [
"torch" "torch"
] ]
example_trainer = [
"atroposlib[rewardfns]",
"vllm",
"accelerate",
"peft",
"requests",
]
dev = [ dev = [
"pytest", "pytest",
"pytest-asyncio", "pytest-asyncio",