narrow down scope further

2026-04-22 16:48:57 +00:00 · 2026-02-27 13:15:23 -05:00 · 2026-02-27 13:15:23 -05:00 · 836c346406
commit 836c346406
parent f343b24a6a
3 changed files with 22 additions and 20 deletions
--- a/example_trainer/README.md
+++ b/example_trainer/README.md
@ -8,6 +8,16 @@ This example uses `vLLM` for efficient inference during the (simulated) data gen

 **Note:** This script is intended as a *reference example* for API integration and basic training setup. It is not optimized for large-scale, efficient training.

+## On-Policy Distillation Scope
+
+The current OPD integration in Atropos is transport-only:
+
+- `ScoredDataGroup` / API payloads support `distill_token_ids` and `distill_logprobs`.
+- Atropos API stores and returns those fields through `/scored_data` and `/batch`.
+- Teacher orchestration (teacher endpoint calls, prompt rendering, top-k fetching) is intentionally out of scope in this PR.
+
+If you train with distillation, provide the two distill arrays from your environment or external data pipeline before posting to the API.
+
 ### Custom vLLM Server

 The `vllm_api_server.py` file in this directory provides a customized vLLM API server implementation based on vLLM's native API. This server exposes enhanced endpoints for token and logprob tracking. The `VLLMServer` class in `atroposlib/envs/server_handling/vllm_server.py` can connect to this server for direct access to vLLM's `/generate` endpoint with full token-level logprobs.