mirror of
https://github.com/NousResearch/atropos.git
synced 2026-04-24 17:04:55 +00:00
local version
This commit is contained in:
parent
7b143a7d68
commit
426c0fac4c
3 changed files with 132 additions and 32 deletions
|
|
@ -156,7 +156,17 @@ python example_trainer/grpo.py \
|
|||
--wandb-project gsm8k-grpo-shared
|
||||
```
|
||||
|
||||
### What Happens
|
||||
### What Happens (Local Mode - num_inference_nodes=0)
|
||||
|
||||
1. vLLM server starts on port 9001
|
||||
2. Trainer initializes bridge in LOCAL MODE (HTTP-based, no NCCL)
|
||||
3. Trainer loads its own model copy and trains normally
|
||||
4. After each `optimizer.step()`:
|
||||
- `bridge.notify_update()` sends HTTP POST to vLLM
|
||||
- Periodic checkpoint saves sync weights to disk
|
||||
5. Much simpler than distributed mode!
|
||||
|
||||
### What Happens (Distributed Mode - num_inference_nodes>0)
|
||||
|
||||
1. vLLM server starts, writes parameter mapping to `$LOGDIR/vllm_bridge_config.json`
|
||||
2. Trainer reads mapping, joins NCCL process group with vLLM
|
||||
|
|
@ -164,7 +174,7 @@ python example_trainer/grpo.py \
|
|||
4. Training loop:
|
||||
- Forward pass uses shared weights
|
||||
- `optimizer.step()` modifies shared tensors in-place
|
||||
- `bridge.notify_update()` signals vLLM (optional coordination)
|
||||
- `bridge.notify_update()` broadcasts via Gloo
|
||||
- vLLM immediately uses new weights for next inference
|
||||
5. No restarts needed!
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue