Commit graph

1 commit

Author SHA1 Message Date
RUFFY-369
0674e31a53 feat: add online reward normalization for multi-env RL training stability
Add RewardNormalizer to atroposlib/envs/ with:
- Welford's online algorithm for running mean/variance (no data storage)
- Z-score and min-max normalization modes
- Configurable reward clipping and warmup period
- Checkpoint save/load support
- Opt-in integration in BaseEnv via 3 new config fields
- WandB metrics for normalization statistics

21/21 tests passing.
2026-03-28 03:31:28 +05:30