Commit graph

317 commits

Author SHA1 Message Date
Prakarsh Kaushik
39fb1d6870
Merge branch 'NousResearch:main' into feat/reward-normalization 2026-03-31 00:57:05 +05:30
RUFFY-369
8a3a582beb style: fix lints and pin dependencies for reward normalization 2026-03-30 17:14:19 +05:30
RUFFY-369
0674e31a53 feat: add online reward normalization for multi-env RL training stability
Add RewardNormalizer to atroposlib/envs/ with:
- Welford's online algorithm for running mean/variance (no data storage)
- Z-score and min-max normalization modes
- Configurable reward clipping and warmup period
- Checkpoint save/load support
- Opt-in integration in BaseEnv via 3 new config fields
- WandB metrics for normalization statistics

21/21 tests passing.
2026-03-28 03:31:28 +05:30
Jai Suphavadeeprasit
75a032bf3e revert openai server 2026-03-23 11:26:05 -07:00
Jai Suphavadeeprasit
295bb9c446 revert openai server 2026-03-23 11:25:28 -07:00
Jai Suphavadeeprasit
8745f0533e revert teacher logprobs 2026-03-23 11:23:47 -07:00
Jai Suphavadeeprasit
45f569f3af clean 2026-03-18 09:20:08 -04:00
Jai Suphavadeeprasit
41947e98d6 clean 2026-03-17 12:25:38 -04:00
Jai Suphavadeeprasit
79baac1ea7 clean 2026-03-17 12:23:35 -04:00
Jai Suphavadeeprasit
7aba0d3fc8 fresh eyes check 2026-03-14 11:20:15 -04:00
Jai Suphavadeeprasit
805a0c0eac revert to similar structure 2026-03-13 20:52:48 -04:00
pre-commit-ci[bot]
f053c77a62 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-14 00:43:23 +00:00
Jai Suphavadeeprasit
9bd299b3ef better logging for devex 2026-03-13 20:41:51 -04:00
pre-commit-ci[bot]
3a85ede8ba [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-13 22:51:58 +00:00
Jai Suphavadeeprasit
a171358f2e structural changes 2026-03-13 18:49:30 -04:00
pre-commit-ci[bot]
12ba3cc3bd [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-13 21:25:27 +00:00
Jai Suphavadeeprasit
1b8ff075c4 adding tests 2026-03-13 17:23:59 -04:00
Jai Suphavadeeprasit
697c594c72 changes 2026-03-13 16:58:37 -04:00
pre-commit-ci[bot]
82964b6e48 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-13 20:13:35 +00:00
Jai Suphavadeeprasit
a8cdb53a4d address problems 2026-03-13 16:12:05 -04:00
Jai Suphavadeeprasit
322e7e6623 remove comments 2026-03-13 13:30:04 -04:00
pre-commit-ci[bot]
994e9c287d [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-13 17:21:00 +00:00
Jai Suphavadeeprasit
a1b545c734 remove cross tokenization and fix location of configs 2026-03-13 13:19:28 -04:00
Jai Suphavadeeprasit
862cd3667d clean logging 2026-03-13 12:38:52 -04:00
Jai Suphavadeeprasit
600c54f5f8 clean log 2026-03-13 12:12:33 -04:00
pre-commit-ci[bot]
d1b0dee8f7 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-13 15:14:09 +00:00
Jai Suphavadeeprasit
690e670e64 investigating weird training issue 2026-03-13 11:06:02 -04:00
Jai Suphavadeeprasit
a43b0b7e72 training kernel 2026-03-13 11:06:02 -04:00
Jai Suphavadeeprasit
c26432b963 training kernel 2026-03-13 11:06:02 -04:00
Jai Suphavadeeprasit
2f371e03fc tokenizer bug 2026-03-13 11:06:02 -04:00
Jai Suphavadeeprasit
78c0a6d082 tokenizer bug 2026-03-13 11:06:02 -04:00
Jai Suphavadeeprasit
09ad401995 sneaky bug logging 2026-03-13 11:06:02 -04:00
Jai Suphavadeeprasit
64794e7c72 sneaky bug 2026-03-13 11:06:00 -04:00
Jai Suphavadeeprasit
bb2736db4e next 2026-03-13 11:05:40 -04:00
Jai Suphavadeeprasit
f44eb810bf teacher env init 2026-03-13 11:04:57 -04:00
dmahan93
f198c1738e Merge conflict commit 2026-03-09 23:13:43 -05:00
Jai Suphavadeeprasit
eb50099361 test_get_logprobs_input_ids_only_passthrough 2026-03-05 17:04:45 -05:00
0xbyt4
4d8e9b8086 fix: use sys.executable instead of hardcoded "python" in tests
Tests that launch the API server via subprocess used a hardcoded
"python" command which fails on systems where only "python3" is
available (e.g. macOS). Using sys.executable ensures the same
interpreter running pytest is used for subprocesses.

Fixes 36 test errors on macOS environments.
2026-03-05 17:04:45 -05:00
pre-commit-ci[bot]
b166c3a9d9 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-05 20:48:17 +00:00
Jai Suphavadeeprasit
b91922082e managed_Server pass through and centralize sem logic 2026-03-05 15:46:33 -05:00
dmahan93
f4875c5dc6 make preserve thinking optional 2026-03-04 15:44:12 -06:00
Jai Suphavadeeprasit
c85a3e5ee7 readme language 2026-03-03 23:44:29 -05:00
Jai Suphavadeeprasit
1a3d9ee664 testing 2026-03-03 23:38:04 -05:00
pre-commit-ci[bot]
efc90bfb1b [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-04 04:18:12 +00:00
Jai Suphavadeeprasit
1eeb31065f fixing comments 2026-03-03 23:16:05 -05:00
Jai Suphavadeeprasit
51088ac24d add tests 2026-03-03 23:08:40 -05:00
pre-commit-ci[bot]
8f304d44fd [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-04 03:08:19 +00:00
Jai Suphavadeeprasit
5aaf7a346c prompt logprobs simplicity 2026-03-03 22:06:49 -05:00
Jai Suphavadeeprasit
f1c20591b6 prompt logprobs 2026-03-03 21:58:05 -05:00
Jai Suphavadeeprasit
439b9b129b prompt logprobs 2026-03-03 21:58:05 -05:00