mirror of
https://github.com/NousResearch/atropos.git
synced 2026-04-19 12:57:58 +00:00
more terminal changes
This commit is contained in:
parent
2f01720899
commit
8d29f49a58
2 changed files with 2 additions and 2 deletions
|
|
@ -5,7 +5,7 @@ Handles data retrieval from Atropos API, padding, batching,
|
|||
and advantage normalization.
|
||||
|
||||
Also extracts inference logprobs for proper GRPO loss computation:
|
||||
- Inference logprobs serve as π_old (reference policy) for importance sampling
|
||||
- Inference logprobs are used in importance-ratio computation
|
||||
- They are batched and padded to align token-by-token with training labels
|
||||
"""
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue