Prakarsh Kaushik
39fb1d6870
Merge branch 'NousResearch:main' into feat/reward-normalization
2026-03-31 00:57:05 +05:30
RUFFY-369
8a3a582beb
style: fix lints and pin dependencies for reward normalization
2026-03-30 17:14:19 +05:30
RUFFY-369
0674e31a53
feat: add online reward normalization for multi-env RL training stability
...
Add RewardNormalizer to atroposlib/envs/ with:
- Welford's online algorithm for running mean/variance (no data storage)
- Z-score and min-max normalization modes
- Configurable reward clipping and warmup period
- Checkpoint save/load support
- Opt-in integration in BaseEnv via 3 new config fields
- WandB metrics for normalization statistics
21/21 tests passing.
2026-03-28 03:31:28 +05:30
Jai Suphavadeeprasit
75a032bf3e
revert openai server
2026-03-23 11:26:05 -07:00
Jai Suphavadeeprasit
295bb9c446
revert openai server
2026-03-23 11:25:28 -07:00
Jai Suphavadeeprasit
8745f0533e
revert teacher logprobs
2026-03-23 11:23:47 -07:00
Jai Suphavadeeprasit
45f569f3af
clean
2026-03-18 09:20:08 -04:00
Jai Suphavadeeprasit
41947e98d6
clean
2026-03-17 12:25:38 -04:00
Jai Suphavadeeprasit
79baac1ea7
clean
2026-03-17 12:23:35 -04:00
Jai Suphavadeeprasit
7aba0d3fc8
fresh eyes check
2026-03-14 11:20:15 -04:00
Jai Suphavadeeprasit
805a0c0eac
revert to similar structure
2026-03-13 20:52:48 -04:00
pre-commit-ci[bot]
f053c77a62
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-03-14 00:43:23 +00:00
Jai Suphavadeeprasit
9bd299b3ef
better logging for devex
2026-03-13 20:41:51 -04:00
pre-commit-ci[bot]
3a85ede8ba
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-03-13 22:51:58 +00:00
Jai Suphavadeeprasit
a171358f2e
structural changes
2026-03-13 18:49:30 -04:00
pre-commit-ci[bot]
12ba3cc3bd
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-03-13 21:25:27 +00:00
Jai Suphavadeeprasit
1b8ff075c4
adding tests
2026-03-13 17:23:59 -04:00
Jai Suphavadeeprasit
697c594c72
changes
2026-03-13 16:58:37 -04:00
pre-commit-ci[bot]
82964b6e48
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-03-13 20:13:35 +00:00
Jai Suphavadeeprasit
a8cdb53a4d
address problems
2026-03-13 16:12:05 -04:00
Jai Suphavadeeprasit
322e7e6623
remove comments
2026-03-13 13:30:04 -04:00
pre-commit-ci[bot]
994e9c287d
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-03-13 17:21:00 +00:00
Jai Suphavadeeprasit
a1b545c734
remove cross tokenization and fix location of configs
2026-03-13 13:19:28 -04:00
Jai Suphavadeeprasit
862cd3667d
clean logging
2026-03-13 12:38:52 -04:00
Jai Suphavadeeprasit
600c54f5f8
clean log
2026-03-13 12:12:33 -04:00
pre-commit-ci[bot]
d1b0dee8f7
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-03-13 15:14:09 +00:00
Jai Suphavadeeprasit
690e670e64
investigating weird training issue
2026-03-13 11:06:02 -04:00
Jai Suphavadeeprasit
a43b0b7e72
training kernel
2026-03-13 11:06:02 -04:00
Jai Suphavadeeprasit
c26432b963
training kernel
2026-03-13 11:06:02 -04:00
Jai Suphavadeeprasit
2f371e03fc
tokenizer bug
2026-03-13 11:06:02 -04:00
Jai Suphavadeeprasit
78c0a6d082
tokenizer bug
2026-03-13 11:06:02 -04:00
Jai Suphavadeeprasit
09ad401995
sneaky bug logging
2026-03-13 11:06:02 -04:00
Jai Suphavadeeprasit
64794e7c72
sneaky bug
2026-03-13 11:06:00 -04:00
Jai Suphavadeeprasit
bb2736db4e
next
2026-03-13 11:05:40 -04:00
Jai Suphavadeeprasit
f44eb810bf
teacher env init
2026-03-13 11:04:57 -04:00
dmahan93
f198c1738e
Merge conflict commit
2026-03-09 23:13:43 -05:00
Jai Suphavadeeprasit
eb50099361
test_get_logprobs_input_ids_only_passthrough
2026-03-05 17:04:45 -05:00
0xbyt4
4d8e9b8086
fix: use sys.executable instead of hardcoded "python" in tests
...
Tests that launch the API server via subprocess used a hardcoded
"python" command which fails on systems where only "python3" is
available (e.g. macOS). Using sys.executable ensures the same
interpreter running pytest is used for subprocesses.
Fixes 36 test errors on macOS environments.
2026-03-05 17:04:45 -05:00
pre-commit-ci[bot]
b166c3a9d9
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-03-05 20:48:17 +00:00
Jai Suphavadeeprasit
b91922082e
managed_Server pass through and centralize sem logic
2026-03-05 15:46:33 -05:00
dmahan93
f4875c5dc6
make preserve thinking optional
2026-03-04 15:44:12 -06:00
Jai Suphavadeeprasit
c85a3e5ee7
readme language
2026-03-03 23:44:29 -05:00
Jai Suphavadeeprasit
1a3d9ee664
testing
2026-03-03 23:38:04 -05:00
pre-commit-ci[bot]
efc90bfb1b
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-03-04 04:18:12 +00:00
Jai Suphavadeeprasit
1eeb31065f
fixing comments
2026-03-03 23:16:05 -05:00
Jai Suphavadeeprasit
51088ac24d
add tests
2026-03-03 23:08:40 -05:00
pre-commit-ci[bot]
8f304d44fd
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-03-04 03:08:19 +00:00
Jai Suphavadeeprasit
5aaf7a346c
prompt logprobs simplicity
2026-03-03 22:06:49 -05:00
Jai Suphavadeeprasit
f1c20591b6
prompt logprobs
2026-03-03 21:58:05 -05:00
Jai Suphavadeeprasit
439b9b129b
prompt logprobs
2026-03-03 21:58:05 -05:00