more terminal changes

This commit is contained in:
Jai Suphavadeeprasit 2026-03-02 14:40:55 -05:00
parent 2f01720899
commit 8d29f49a58
2 changed files with 2 additions and 2 deletions

View file

@ -5,7 +5,7 @@ Handles data retrieval from Atropos API, padding, batching,
and advantage normalization.
Also extracts inference logprobs for proper GRPO loss computation:
- Inference logprobs serve as π_old (reference policy) for importance sampling
- Inference logprobs are used in importance-ratio computation
- They are batched and padded to align token-by-token with training labels
"""