README changes

This commit is contained in:
Jai Suphavadeeprasit 2026-03-02 09:44:06 -05:00
parent 91afc9e46e
commit 4a7da8049f
2 changed files with 12 additions and 13 deletions

View file

@ -86,7 +86,7 @@ atropos-grpo \
## Objective Notes
- GRPO uses rollout/inference logprobs (`pi_old`) for importance-ratio computation.
- GRPO uses rollout `inference_logprobs` for importance-ratio computation.
- The trainer currently uses clipped importance-ratio updates without a separate frozen-reference-model KL term.
## Outputs