remove reqs and update community readme

2026-04-19 12:57:58 +00:00 · 2026-02-26 10:02:26 -05:00 · 2026-02-26 10:02:26 -05:00 · dbf6026165
commit dbf6026165
parent fb3228f669
5 changed files with 116 additions and 12 deletions
--- a/environments/community/mcp_tool_calling/GRPO_README.md
+++ b/environments/community/mcp_tool_calling/GRPO_README.md
@ -1,5 +1,108 @@
 # GRPO Example Trainer
 This guide explains how to run the `example_trainer` integration with Atropos using GRPO.
 The trainer is a reference implementation for end-to-end wiring (`environment -> run-api -> rollout server -> optimizer`), with multiple synchronization modes with vLLM.
 ## Supported Modes
 - `shared_vllm`: single-copy training via CUDA IPC (trainer updates shared vLLM tensors in place)
 - `lora_only`: LoRA adapter training with HTTP hot-swap (slow due to eager mode)
 - `lora_restart`: LoRA adapter training with periodic vLLM restart (faster than `lora_only`)
 - `none`: legacy full-checkpoint flow with vLLM reloads
 ## Prerequisites
 1. Python 3.10+
 2. CUDA-capable PyTorch environment for GPU training
 3. Atropos API server available (`run-api`)
 4. An environment process producing trajectories (for example GSM8K server)
 ## Installation
 From repository root:
 ```bash
 pip install -e ".[example_trainer]"
 ```
 Optional (all extras):
 ```bash
 pip install -e ".[all]"
 ```
 ## CLI Entry Points
 After install, you can use either module invocation or script entrypoints:
 - `python -m example_trainer.grpo` or `atropos-grpo`
 - `python -m example_trainer.run` or `atropos-grpo-run`
 ## Minimal End-to-End Startup
 ### 1) Start Atropos API
 ```bash
 run-api --port 8002
 ```
 ### 2) Start an environment
 ```bash
 python environments/gsm8k_server.py serve \
  --env.rollout_server_url "http://localhost:8002" \
  --openai.server_type vllm \
  --openai.base_url "http://localhost:9001/v1" \
  --openai.api_key "dummy"
 ```
 ### 3) Start vLLM server (shared-weights example)
 ```bash
 VLLM_ENABLE_SHARED_WEIGHTS=1 LOGDIR=/tmp/grpo_training \
 python -m example_trainer.vllm_api_server \
  --model Qwen/Qwen3-1.7B-Base \
  --port 9001 \
  --gpu-memory-utilization 0.45 \
  --enforce-eager
 ```
 ### 4) Start trainer
 ```bash
 atropos-grpo \
  --model-name Qwen/Qwen3-1.7B-Base \
  --weight-bridge-mode shared_vllm \
  --vllm-port 9001 \
  --vllm-config-path /tmp/grpo_training/vllm_bridge_config.json \
  --atropos-url "http://localhost:8002" \
  --batch-size 1 \
  --gradient-accumulation-steps 64 \
  --warmup-steps 5 \
  --training-steps 30 \
  --kl-coef 0.0 \
  --clip-eps 0.2
 ```
 ## Objective Notes
 - GRPO uses rollout/inference logprobs (`pi_old`) for importance-ratio computation.
 - The optional KL-like term is sampled-token regularization against rollout policy logprobs, not a separate frozen-reference-model KL.
 ## Outputs
 - Trainer logs to stdout (and optional W&B if enabled)
 - Checkpoints under `--save-path`
 - Mode-specific logs/checkpoints when using matrix/orchestration scripts
 ## Troubleshooting
 - If vLLM health checks time out, inspect `vllm.log`, `trainer.log`, and `env.log`.
 - If targeted shared-layer runs lose gradients, ensure non-reentrant checkpointing is enabled in shared mode.
 - If environment workers time out at 600s, reduce env concurrency (`--env.max_num_workers_per_node`) and batch pressure.
 # GRPO Example Trainer
 This directory contains an example script (`grpo.py`) demonstrating how to integrate a custom training loop with the Atropos API for reinforcement learning using the GRPO (Group Relative Policy Optimization) algorithm.
 **Note: Example trainer does not support multimodal training out of the box. As other trainers add support for Atropos, we will list them in the main readme, some of which may support multimodal RL - please check the main repo readme for any updates.**
@ -68,7 +171,7 @@ Once the prerequisites are met and configuration is set:
 ```bash
 # Install dependencies
-pip install -r example_trainer/requirements.txt
+pip install -e ".[example_trainer]"
 # Run the trainer directly (basic test)
 python example_trainer/grpo.py
--- a/example_trainer/README.md
+++ b/example_trainer/README.md
@ -107,7 +107,9 @@ The `lora_only` mode requires `--enforce-eager` which **disables CUDA graphs**,
 ## Quick Start: LoRA Training (Recommended)
 ### Step 1: Install Dependencies
- They are listed in the requirements.txt file that you can see
+- Install from `pyproject.toml` extras:
  - `pip install -e ".[example_trainer]"`
  - or everything: `pip install -e ".[all]"`
 ### Step 2: Start All Components
--- a/example_trainer/requirements.txt
+++ b/example_trainer/requirements.txt
@ -1,8 +0,0 @@
 vllm
 torch
 transformers
 datasets
 accelerate
 peft
 requests
 wandb
--- a/llms.txt
+++ b/llms.txt
@ -299,7 +299,7 @@ The `example_trainer/` directory provides `grpo.py`, a script demonstrating inte
 1.  Python 3.8+ (Python 3.10+ recommended for Atropos overall).
 2.  Running Atropos API server (default: `http://localhost:8000`). Accessible via `run-api`.
-3.  Required Python packages: `torch`, `transformers`, `vllm`, `pydantic`, `numpy`, `requests`, `tenacity`, `wandb` (optional). Install via `pip install -r example_trainer/requirements.txt` or `pip install -e .[examples]`.
+3.  Required Python packages: `torch`, `transformers`, `vllm`, `pydantic`, `numpy`, `requests`, `tenacity`, `wandb` (optional). Install via `pip install -e ".[example_trainer]"` (or `pip install -e ".[all]"`).
 4.  A running Atropos environment (e.g., `python environments/gsm8k_server.py serve --slurm False`).
 ### 9.2. Setup
--- a/pyproject.toml
+++ b/pyproject.toml
@ -40,11 +40,18 @@ atropos-grpo-run = "example_trainer.run:main"
 [project.optional-dependencies]
 all = [
-    "atroposlib[dev,examples]"
+    "atroposlib[dev,examples,example_trainer]"
 ]
 rewardfns = [
    "torch"
 ]
 example_trainer = [
    "atroposlib[rewardfns]",
    "vllm",
    "accelerate",
    "peft",
    "requests",
 ]
 dev = [
    "pytest",
    "pytest-asyncio",