atropos

NousResearch/atropos

Fork 0

mirror of https://github.com/NousResearch/atropos.git synced 2026-04-24 17:04:55 +00:00

Commit graph

Author	SHA1	Message	Date
RUFFY-369	0674e31a53	feat: add online reward normalization for multi-env RL training stability Add RewardNormalizer to atroposlib/envs/ with: - Welford's online algorithm for running mean/variance (no data storage) - Z-score and min-max normalization modes - Configurable reward clipping and warmup period - Checkpoint save/load support - Opt-in integration in BaseEnv via 3 new config fields - WandB metrics for normalization statistics 21/21 tests passing.	2026-03-28 03:31:28 +05:30

Author

SHA1

Message

Date

RUFFY-369

0674e31a53

feat: add online reward normalization for multi-env RL training stability

Add RewardNormalizer to atroposlib/envs/ with:
- Welford's online algorithm for running mean/variance (no data storage)
- Z-score and min-max normalization modes
- Configurable reward clipping and warmup period
- Checkpoint save/load support
- Opt-in integration in BaseEnv via 3 new config fields
- WandB metrics for normalization statistics

21/21 tests passing.

2026-03-28 03:31:28 +05:30

1 commit