mirror of
https://github.com/NousResearch/atropos.git
synced 2026-04-24 17:04:55 +00:00
remove cross tokenization and fix location of configs
This commit is contained in:
parent
148a4fd5eb
commit
a1b545c734
6 changed files with 147 additions and 302 deletions
|
|
@ -304,6 +304,29 @@ environment uses the `/generate` path and includes token-level
|
|||
4. Trainer extracts and aligns logprobs with training labels
|
||||
5. GRPO loss uses these rollout logprobs in importance-ratio terms
|
||||
|
||||
### 1b. Teacher distillation requires the same tokenizer
|
||||
|
||||
When distillation data is attached to Atropos batches, the trainer treats
|
||||
`distill_token_ids` as indices into the student's logit tensor. That only works
|
||||
if the teacher and student share the same tokenizer vocabulary.
|
||||
|
||||
What to configure on the environment side:
|
||||
|
||||
```bash
|
||||
--env.teacher_enabled true \
|
||||
--env.teacher_server.base_url "http://localhost:9003/v1" \
|
||||
--env.teacher_server.model_name "$TEACHER_MODEL" \
|
||||
--env.teacher_server.server_type vllm \
|
||||
--env.teacher_top_k 8
|
||||
```
|
||||
|
||||
Why cross-tokenizer conversion is not acceptable here:
|
||||
|
||||
- Teacher token ID `1234` and student token ID `1234` can correspond to different text.
|
||||
- Re-tokenizing teacher text changes token boundaries, so teacher position `i` may no longer correspond to student position `i`.
|
||||
- Remapping teacher top-k tokens back into student vocab can collapse multiple teacher candidates into one student token or expand one teacher token into multiple student tokens.
|
||||
- The current distillation loss expects exact per-position supervision in student token space, so an approximate remapping would silently produce misleading targets.
|
||||
|
||||
### 2. Clipping
|
||||
|
||||
```bash
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue