Prakarsh Kaushik
39fb1d6870
Merge branch 'NousResearch:main' into feat/reward-normalization
2026-03-31 00:57:05 +05:30
RUFFY-369
0674e31a53
feat: add online reward normalization for multi-env RL training stability
...
Add RewardNormalizer to atroposlib/envs/ with:
- Welford's online algorithm for running mean/variance (no data storage)
- Z-score and min-max normalization modes
- Configurable reward clipping and warmup period
- Checkpoint save/load support
- Opt-in integration in BaseEnv via 3 new config fields
- WandB metrics for normalization statistics
21/21 tests passing.
2026-03-28 03:31:28 +05:30
Jai Suphavadeeprasit
75a032bf3e
revert openai server
2026-03-23 11:26:05 -07:00
Jai Suphavadeeprasit
295bb9c446
revert openai server
2026-03-23 11:25:28 -07:00
Jai Suphavadeeprasit
45f569f3af
clean
2026-03-18 09:20:08 -04:00
Jai Suphavadeeprasit
41947e98d6
clean
2026-03-17 12:25:38 -04:00
Jai Suphavadeeprasit
79baac1ea7
clean
2026-03-17 12:23:35 -04:00
Jai Suphavadeeprasit
7aba0d3fc8
fresh eyes check
2026-03-14 11:20:15 -04:00
Jai Suphavadeeprasit
805a0c0eac
revert to similar structure
2026-03-13 20:52:48 -04:00
Jai Suphavadeeprasit
9bd299b3ef
better logging for devex
2026-03-13 20:41:51 -04:00
pre-commit-ci[bot]
3a85ede8ba
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-03-13 22:51:58 +00:00
Jai Suphavadeeprasit
a171358f2e
structural changes
2026-03-13 18:49:30 -04:00
Jai Suphavadeeprasit
1b8ff075c4
adding tests
2026-03-13 17:23:59 -04:00
Jai Suphavadeeprasit
697c594c72
changes
2026-03-13 16:58:37 -04:00
pre-commit-ci[bot]
82964b6e48
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-03-13 20:13:35 +00:00
Jai Suphavadeeprasit
a8cdb53a4d
address problems
2026-03-13 16:12:05 -04:00
Jai Suphavadeeprasit
322e7e6623
remove comments
2026-03-13 13:30:04 -04:00
pre-commit-ci[bot]
994e9c287d
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-03-13 17:21:00 +00:00
Jai Suphavadeeprasit
a1b545c734
remove cross tokenization and fix location of configs
2026-03-13 13:19:28 -04:00
Jai Suphavadeeprasit
862cd3667d
clean logging
2026-03-13 12:38:52 -04:00
Jai Suphavadeeprasit
600c54f5f8
clean log
2026-03-13 12:12:33 -04:00
pre-commit-ci[bot]
d1b0dee8f7
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-03-13 15:14:09 +00:00
Jai Suphavadeeprasit
c26432b963
training kernel
2026-03-13 11:06:02 -04:00
Jai Suphavadeeprasit
2f371e03fc
tokenizer bug
2026-03-13 11:06:02 -04:00
Jai Suphavadeeprasit
78c0a6d082
tokenizer bug
2026-03-13 11:06:02 -04:00
Jai Suphavadeeprasit
09ad401995
sneaky bug logging
2026-03-13 11:06:02 -04:00
Jai Suphavadeeprasit
64794e7c72
sneaky bug
2026-03-13 11:06:00 -04:00
Jai Suphavadeeprasit
bb2736db4e
next
2026-03-13 11:05:40 -04:00
Jai Suphavadeeprasit
f44eb810bf
teacher env init
2026-03-13 11:04:57 -04:00
dmahan93
f198c1738e
Merge conflict commit
2026-03-09 23:13:43 -05:00
Jai Suphavadeeprasit
b91922082e
managed_Server pass through and centralize sem logic
2026-03-05 15:46:33 -05:00
dmahan93
f4875c5dc6
make preserve thinking optional
2026-03-04 15:44:12 -06:00
Jai Suphavadeeprasit
c85a3e5ee7
readme language
2026-03-03 23:44:29 -05:00
pre-commit-ci[bot]
efc90bfb1b
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-03-04 04:18:12 +00:00
Jai Suphavadeeprasit
1eeb31065f
fixing comments
2026-03-03 23:16:05 -05:00
pre-commit-ci[bot]
8f304d44fd
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-03-04 03:08:19 +00:00
Jai Suphavadeeprasit
5aaf7a346c
prompt logprobs simplicity
2026-03-03 22:06:49 -05:00
Jai Suphavadeeprasit
f1c20591b6
prompt logprobs
2026-03-03 21:58:05 -05:00
Jai Suphavadeeprasit
439b9b129b
prompt logprobs
2026-03-03 21:58:05 -05:00
dmahan93
12d61d197f
add env using the tool api stuff
2026-03-03 19:51:30 -06:00
dmahan93
c8eb63f33d
readme updates for tool calling
2026-03-03 12:22:10 -06:00
pre-commit-ci[bot]
e98100e5f6
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-03-03 17:21:00 +00:00
Jai Suphavadeeprasit
323a8a2601
readme updates
2026-03-03 12:19:55 -05:00
Jai Suphavadeeprasit
b9291aa29f
init commit
2026-03-03 11:32:09 -05:00
dmahan93
8f21bb57ed
add better warning message
2026-03-02 23:21:25 -06:00
dmahan93
add42a2afb
add tool call parsing based on vllm impl and an openai server endpoint
2026-03-02 23:17:13 -06:00
pre-commit-ci[bot]
216c1f5899
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-02-27 21:17:58 +00:00
Jai Suphavadeeprasit
35587cbdc0
logger changes
2026-02-27 16:17:03 -05:00
pre-commit-ci[bot]
64d3ee1bd6
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-02-27 18:16:06 +00:00
Jai Suphavadeeprasit
f343b24a6a
narrow down scope
2026-02-27 11:14:42 -05:00