Commit graph

230 commits

Author SHA1 Message Date
Prakarsh Kaushik
39fb1d6870
Merge branch 'NousResearch:main' into feat/reward-normalization 2026-03-31 00:57:05 +05:30
RUFFY-369
0674e31a53 feat: add online reward normalization for multi-env RL training stability
Add RewardNormalizer to atroposlib/envs/ with:
- Welford's online algorithm for running mean/variance (no data storage)
- Z-score and min-max normalization modes
- Configurable reward clipping and warmup period
- Checkpoint save/load support
- Opt-in integration in BaseEnv via 3 new config fields
- WandB metrics for normalization statistics

21/21 tests passing.
2026-03-28 03:31:28 +05:30
Jai Suphavadeeprasit
75a032bf3e revert openai server 2026-03-23 11:26:05 -07:00
Jai Suphavadeeprasit
295bb9c446 revert openai server 2026-03-23 11:25:28 -07:00
Jai Suphavadeeprasit
45f569f3af clean 2026-03-18 09:20:08 -04:00
Jai Suphavadeeprasit
41947e98d6 clean 2026-03-17 12:25:38 -04:00
Jai Suphavadeeprasit
79baac1ea7 clean 2026-03-17 12:23:35 -04:00
Jai Suphavadeeprasit
7aba0d3fc8 fresh eyes check 2026-03-14 11:20:15 -04:00
Jai Suphavadeeprasit
805a0c0eac revert to similar structure 2026-03-13 20:52:48 -04:00
Jai Suphavadeeprasit
9bd299b3ef better logging for devex 2026-03-13 20:41:51 -04:00
pre-commit-ci[bot]
3a85ede8ba [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-13 22:51:58 +00:00
Jai Suphavadeeprasit
a171358f2e structural changes 2026-03-13 18:49:30 -04:00
Jai Suphavadeeprasit
1b8ff075c4 adding tests 2026-03-13 17:23:59 -04:00
Jai Suphavadeeprasit
697c594c72 changes 2026-03-13 16:58:37 -04:00
pre-commit-ci[bot]
82964b6e48 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-13 20:13:35 +00:00
Jai Suphavadeeprasit
a8cdb53a4d address problems 2026-03-13 16:12:05 -04:00
Jai Suphavadeeprasit
322e7e6623 remove comments 2026-03-13 13:30:04 -04:00
pre-commit-ci[bot]
994e9c287d [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-13 17:21:00 +00:00
Jai Suphavadeeprasit
a1b545c734 remove cross tokenization and fix location of configs 2026-03-13 13:19:28 -04:00
Jai Suphavadeeprasit
862cd3667d clean logging 2026-03-13 12:38:52 -04:00
Jai Suphavadeeprasit
600c54f5f8 clean log 2026-03-13 12:12:33 -04:00
pre-commit-ci[bot]
d1b0dee8f7 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-13 15:14:09 +00:00
Jai Suphavadeeprasit
c26432b963 training kernel 2026-03-13 11:06:02 -04:00
Jai Suphavadeeprasit
2f371e03fc tokenizer bug 2026-03-13 11:06:02 -04:00
Jai Suphavadeeprasit
78c0a6d082 tokenizer bug 2026-03-13 11:06:02 -04:00
Jai Suphavadeeprasit
09ad401995 sneaky bug logging 2026-03-13 11:06:02 -04:00
Jai Suphavadeeprasit
64794e7c72 sneaky bug 2026-03-13 11:06:00 -04:00
Jai Suphavadeeprasit
bb2736db4e next 2026-03-13 11:05:40 -04:00
Jai Suphavadeeprasit
f44eb810bf teacher env init 2026-03-13 11:04:57 -04:00
dmahan93
f198c1738e Merge conflict commit 2026-03-09 23:13:43 -05:00
Jai Suphavadeeprasit
b91922082e managed_Server pass through and centralize sem logic 2026-03-05 15:46:33 -05:00
dmahan93
f4875c5dc6 make preserve thinking optional 2026-03-04 15:44:12 -06:00
Jai Suphavadeeprasit
c85a3e5ee7 readme language 2026-03-03 23:44:29 -05:00
pre-commit-ci[bot]
efc90bfb1b [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-04 04:18:12 +00:00
Jai Suphavadeeprasit
1eeb31065f fixing comments 2026-03-03 23:16:05 -05:00
pre-commit-ci[bot]
8f304d44fd [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-04 03:08:19 +00:00
Jai Suphavadeeprasit
5aaf7a346c prompt logprobs simplicity 2026-03-03 22:06:49 -05:00
Jai Suphavadeeprasit
f1c20591b6 prompt logprobs 2026-03-03 21:58:05 -05:00
Jai Suphavadeeprasit
439b9b129b prompt logprobs 2026-03-03 21:58:05 -05:00
dmahan93
12d61d197f add env using the tool api stuff 2026-03-03 19:51:30 -06:00
dmahan93
c8eb63f33d readme updates for tool calling 2026-03-03 12:22:10 -06:00
pre-commit-ci[bot]
e98100e5f6 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-03 17:21:00 +00:00
Jai Suphavadeeprasit
323a8a2601 readme updates 2026-03-03 12:19:55 -05:00
Jai Suphavadeeprasit
b9291aa29f init commit 2026-03-03 11:32:09 -05:00
dmahan93
8f21bb57ed add better warning message 2026-03-02 23:21:25 -06:00
dmahan93
add42a2afb add tool call parsing based on vllm impl and an openai server endpoint 2026-03-02 23:17:13 -06:00
pre-commit-ci[bot]
216c1f5899 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-02-27 21:17:58 +00:00
Jai Suphavadeeprasit
35587cbdc0 logger changes 2026-02-27 16:17:03 -05:00
pre-commit-ci[bot]
64d3ee1bd6 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-02-27 18:16:06 +00:00
Jai Suphavadeeprasit
f343b24a6a narrow down scope 2026-02-27 11:14:42 -05:00