Prakarsh Kaushik
39fb1d6870
Merge branch 'NousResearch:main' into feat/reward-normalization
2026-03-31 00:57:05 +05:30
RUFFY-369
8a3a582beb
style: fix lints and pin dependencies for reward normalization
2026-03-30 17:14:19 +05:30
RUFFY-369
0674e31a53
feat: add online reward normalization for multi-env RL training stability
...
Add RewardNormalizer to atroposlib/envs/ with:
- Welford's online algorithm for running mean/variance (no data storage)
- Z-score and min-max normalization modes
- Configurable reward clipping and warmup period
- Checkpoint save/load support
- Opt-in integration in BaseEnv via 3 new config fields
- WandB metrics for normalization statistics
21/21 tests passing.
2026-03-28 03:31:28 +05:30
Jai Suphavadeeprasit
8745f0533e
revert teacher logprobs
2026-03-23 11:23:47 -07:00
Jai Suphavadeeprasit
7aba0d3fc8
fresh eyes check
2026-03-14 11:20:15 -04:00
pre-commit-ci[bot]
f053c77a62
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-03-14 00:43:23 +00:00
Jai Suphavadeeprasit
9bd299b3ef
better logging for devex
2026-03-13 20:41:51 -04:00
pre-commit-ci[bot]
3a85ede8ba
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-03-13 22:51:58 +00:00
Jai Suphavadeeprasit
a171358f2e
structural changes
2026-03-13 18:49:30 -04:00
pre-commit-ci[bot]
12ba3cc3bd
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-03-13 21:25:27 +00:00
Jai Suphavadeeprasit
1b8ff075c4
adding tests
2026-03-13 17:23:59 -04:00
Jai Suphavadeeprasit
a8cdb53a4d
address problems
2026-03-13 16:12:05 -04:00
pre-commit-ci[bot]
994e9c287d
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-03-13 17:21:00 +00:00
Jai Suphavadeeprasit
a1b545c734
remove cross tokenization and fix location of configs
2026-03-13 13:19:28 -04:00
Jai Suphavadeeprasit
862cd3667d
clean logging
2026-03-13 12:38:52 -04:00
pre-commit-ci[bot]
d1b0dee8f7
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-03-13 15:14:09 +00:00
Jai Suphavadeeprasit
f44eb810bf
teacher env init
2026-03-13 11:04:57 -04:00
dmahan93
f198c1738e
Merge conflict commit
2026-03-09 23:13:43 -05:00
Jai Suphavadeeprasit
eb50099361
test_get_logprobs_input_ids_only_passthrough
2026-03-05 17:04:45 -05:00
0xbyt4
4d8e9b8086
fix: use sys.executable instead of hardcoded "python" in tests
...
Tests that launch the API server via subprocess used a hardcoded
"python" command which fails on systems where only "python3" is
available (e.g. macOS). Using sys.executable ensures the same
interpreter running pytest is used for subprocesses.
Fixes 36 test errors on macOS environments.
2026-03-05 17:04:45 -05:00
pre-commit-ci[bot]
b166c3a9d9
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-03-05 20:48:17 +00:00
Jai Suphavadeeprasit
b91922082e
managed_Server pass through and centralize sem logic
2026-03-05 15:46:33 -05:00
Jai Suphavadeeprasit
1a3d9ee664
testing
2026-03-03 23:38:04 -05:00
Jai Suphavadeeprasit
1eeb31065f
fixing comments
2026-03-03 23:16:05 -05:00
Jai Suphavadeeprasit
51088ac24d
add tests
2026-03-03 23:08:40 -05:00
Jai Suphavadeeprasit
5aaf7a346c
prompt logprobs simplicity
2026-03-03 22:06:49 -05:00
Jai Suphavadeeprasit
439b9b129b
prompt logprobs
2026-03-03 21:58:05 -05:00
dmahan93
12d61d197f
add env using the tool api stuff
2026-03-03 19:51:30 -06:00
Jai Suphavadeeprasit
b9291aa29f
init commit
2026-03-03 11:32:09 -05:00
dmahan93
add42a2afb
add tool call parsing based on vllm impl and an openai server endpoint
2026-03-02 23:17:13 -06:00
ansulx
d97f366ae0
Add regression test for TRL vLLM completion wrapper
...
Ensure the TRL vLLM completion wrapper returns a Completion with text so issue #183 stays covered.
2026-02-06 01:57:16 +05:30
VolodymyrBg
77a3505955
Update test_openai_api_workarounds.py
2026-01-29 10:13:50 +02:00
Teknium
84a8bbb9cb
Merge pull request #317 from Savage890/fix/issue-308-jsonl2html
...
fix: handle nested message format in jsonl2html.py
2026-01-16 06:47:44 -08:00
teknium
31a8cdc7a7
update test to reflect the change in reasoning effort mapping
2026-01-15 07:48:52 +00:00
teknium
0316cac8d1
Rename is_active method to is_reasoning_kwargs_active in ReasoningConfig for clarity. Update references in the class and corresponding tests to reflect this change.
2026-01-15 06:26:31 +00:00
pre-commit-ci[bot]
39e9a233db
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-01-14 21:36:05 +00:00
Savage890
39f05d18fa
fix: handle nested message format in jsonl2html.py ( #308 )
2026-01-15 03:01:15 +05:30
pre-commit-ci[bot]
6cfcbdf4d5
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-01-05 23:20:47 +00:00
teknium
e1ece3e64e
Add reasoning configuration support across server implementations
...
- Updated server classes (OpenAIServer, SGLangServer, TrlVllmServer, VLLMServer) to accept a ReasoningConfig parameter during initialization.
- Enhanced ReasoningConfig to allow flexible max_tokens without strict validation, accommodating varying provider limits.
- Implemented reasoning configuration injection in APIServer methods for chat and completion handling.
- Updated tests to reflect changes in max_tokens validation logic.
This commit integrates reasoning capabilities into the server handling architecture, improving compatibility with diverse reasoning models.
2026-01-05 23:20:01 +00:00
teknium
89c9697665
fix test
2025-12-30 23:08:54 +00:00
teknium
127a925471
Merge branch 'add_reasoning_handling_draft' of https://github.com/NousResearch/atropos into add_reasoning_handling_draft
2025-12-30 11:59:46 +00:00
teknium
747fbc9285
fix linting
2025-12-30 11:56:21 +00:00
pre-commit-ci[bot]
97047eee7b
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2025-12-30 00:26:33 +00:00
teknium
62fa51240c
Add support for reasoning models and their variety of providers/endpoints
2025-12-30 00:23:00 +00:00
dmahan93
b1e164eef5
Merge pull request #264 from NousResearch/add-logprob-server-manager-fn
...
add sglang specific token level logprob handling and server manager/b…
2025-10-29 13:53:39 -07:00
Dakota
3c8fc32288
fix test case
2025-10-29 14:38:16 -05:00
pre-commit-ci[bot]
0d80da5146
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2025-10-24 20:10:29 +00:00
dmahan93
7bf4cfbf80
add managed server to make grabbing logprobs easier w/ tokenized items
2025-10-24 13:09:46 -07:00
pre-commit-ci[bot]
0840c26e94
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2025-10-15 04:19:25 +00:00
ropresearch
e5b8fb8654
clean up
2025-10-10 11:50:39 -04:00