mirror of
https://github.com/NousResearch/atropos.git
synced 2026-04-19 12:57:58 +00:00
README changes
This commit is contained in:
parent
91afc9e46e
commit
4a7da8049f
2 changed files with 12 additions and 13 deletions
|
|
@ -86,7 +86,7 @@ atropos-grpo \
|
|||
|
||||
## Objective Notes
|
||||
|
||||
- GRPO uses rollout/inference logprobs (`pi_old`) for importance-ratio computation.
|
||||
- GRPO uses rollout `inference_logprobs` for importance-ratio computation.
|
||||
- The trainer currently uses clipped importance-ratio updates without a separate frozen-reference-model KL term.
|
||||
|
||||
## Outputs
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue