RUFFY-369
|
01da524b6b
|
feat: add curriculum learning scheduler for sample-efficient RL training
Add CurriculumScheduler to atroposlib/envs/ with:
- EMA-based per-item difficulty tracking from reward signals
- Quantile-based difficulty binning (configurable N bins)
- Three sampling strategies: uniform, easy_first, competence_based
- Competence-based strategy cites Platanios et al. 2019
- Opt-in integration in BaseEnv via 3 config fields
- WandB metrics for difficulty distribution tracking
- Checkpoint save/load support
22/22 tests passing.
|
2026-03-28 03:39:01 +05:30 |
|