Commit graph

593 commits

Author SHA1 Message Date
Allan Niemerg
5bb5bd2c3d Add BLEUBERI environment for reference-based RL 2025-09-08 11:21:27 -05:00
Alvarez
bad4fb84df
Update plot.py 2025-08-30 19:22:57 +02:00
pre-commit-ci[bot]
127b5736a5 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-08-28 18:08:26 +00:00
Jai Suphavadeeprasit
7462f45447 sampling params 2025-08-28 14:07:29 -04:00
Jai Suphavadeeprasit
3944e7ef9b linting 2025-08-28 12:54:08 -04:00
Jai Suphavadeeprasit
1bfe294414 Other major changes 2025-08-28 12:24:08 -04:00
Jai Suphavadeeprasit
ec09a1caee Other major changes 2025-08-28 12:04:42 -04:00
Jai Suphavadeeprasit
b56d03b25c changes linting 2025-08-28 03:53:12 -04:00
Jai Suphavadeeprasit
f6f3c04313 organized 2025-08-28 03:35:41 -04:00
Jai Suphavadeeprasit
0bcc406b02 race conditions 2025-08-28 03:35:41 -04:00
Jai Suphavadeeprasit
53710e95ec min@ 2025-08-28 03:35:41 -04:00
pre-commit-ci[bot]
dec92b2a6e [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-08-19 16:30:37 +00:00
Jai Suphavadeeprasit
6266748027 Other linting 2025-08-19 12:20:33 -04:00
Jai Suphavadeeprasit
4d404c0be6 os 2025-08-19 12:05:04 -04:00
Jai Suphavadeeprasit
aac9f5a926 linting 2025-08-19 12:03:13 -04:00
pre-commit-ci[bot]
c1d97b85a3 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-08-19 12:03:13 -04:00
Jai Suphavadeeprasit
8b55815e2f Linting fixes 2025-08-19 12:03:13 -04:00
pre-commit-ci[bot]
750489493f [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-08-19 12:03:13 -04:00
Jai Suphavadeeprasit
f76f9d1596 cleanup 2025-08-19 12:03:13 -04:00
pre-commit-ci[bot]
62b72589c6 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-08-19 12:03:13 -04:00
Jai Suphavadeeprasit
e55a7a0100 add_danger 2025-08-19 12:03:13 -04:00
teknium
bed7ddcb95 add more default categories 2025-08-19 12:03:13 -04:00
teknium
39f0103313 fix dataset 2025-08-19 12:03:13 -04:00
teknium
ff7a2569dc update default max_toks 2025-08-19 12:03:13 -04:00
teknium
69135320b4 initial refusalbenchv2 2025-08-19 12:03:13 -04:00
pre-commit-ci[bot]
321478dd5f [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-08-12 06:45:36 +00:00
interstellarninja
2f6025e65b fixing precommit formatting errors 2025-08-12 02:42:47 -04:00
shannonsands
46f0602227
Diplomacy trainer env (#227)
* minimal implementation, simplified challenge registry

* need game save logic

* fixed challenge gen, works with local test

* updated challenge gen with wider ranges, working with local script

* runs working correctly, wandb stats look ok

* linting

* Add diplomacy environment with AI_Diplomacy submodule

- Add diplomacy_env_minimal.py for diplomacy game environment
- Add atropos_client_minimal.py for client interface
- Add diplomacy_local_server.py for local game server
- Add AI_Diplomacy submodule from GoodStartLabs/AI_Diplomacy
- Fix import ordering and remove unused imports

* test file working, moving to cluster to test training

* updated gitignore

* removed logs

* minor fixes, training running now

* readded proxy reg and queue system

* linting

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* queue gameid bug, refactored

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleaned up configs & allowed for openrouter models to be easily used

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* linting

* Remove duplicate dependencies from diplomacy requirements.txt

Only keep AI_Diplomacy-specific dependencies that aren't already in the main project

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-08-12 09:02:16 +10:00
shannonsands
47cb15745c
Textworld minimal (#225)
* minimal implementation, simplified challenge registry

* need game save logic

* fixed challenge gen, works with local test

* updated challenge gen with wider ranges, working with local script

* runs working correctly, wandb stats look ok

* linting

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unused imports

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-08-01 10:16:35 +10:00
Teknium
be66e120d9
Merge pull request #219 from NousResearch/add-arenahard-v1-environment
Add arena-hard v1 environment
2025-07-30 09:35:14 -07:00
pre-commit-ci[bot]
65aea8bb21 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-07-30 15:10:36 +00:00
teknium
75f1cf6d2a move eval envs to eval_environments and update readmes 2025-07-30 15:09:34 +00:00
Aboozle1
3ce68aed38
Merge branch 'main' into add-my-environment 2025-07-28 11:50:50 -05:00
Abhaykhanna3
9d7bcc523f Fix(PR): Address reviewer feedback
- Remove redundant requirements.txt
- Fix leading newline in prompt templates
2025-07-28 11:48:02 -05:00
Abhaykhanna3
b5234d4214 Add Word Hunt environment for training models on 4x4 letter grids
- Trie-based solver, official scoring, normalized rewards
- Configurable token limit and detailed README with dictionary download link
- Removes large Dictionary.txt from tracking and adds ignore rules
- All tests pass and pre-commit hooks are clean
2025-07-28 00:37:36 -05:00
pre-commit-ci[bot]
4c88a4bbb9 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-07-28 01:37:03 +00:00
teknium
aaebb8d6bb linter linter 2025-07-28 01:35:49 +00:00
pre-commit-ci[bot]
cbec584202 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-07-28 01:30:27 +00:00
teknium
1d523472cc Merge branch 'add-arenahard-v1-environment' of https://github.com/NousResearch/atropos into add-arenahard-v1-environment 2025-07-28 01:29:52 +00:00
teknium
20565d8abc update judge confs so it can use any judge model 2025-07-28 01:29:50 +00:00
pre-commit-ci[bot]
041a70d891 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-07-27 03:39:42 +00:00
teknium
e6de7bb432 lint 2025-07-27 03:39:05 +00:00
pre-commit-ci[bot]
52b505296c [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-07-27 02:52:39 +00:00
teknium
a0979eb08e add readme section 2025-07-27 02:46:51 +00:00
teknium
31b0c6f66d Add arena-hard v1 environment 2025-07-26 21:17:00 +00:00
pre-commit-ci[bot]
65682d160a [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-07-26 21:13:05 +00:00
teknium
aa66b09c13 make linter happy 2025-07-26 21:12:30 +00:00
pre-commit-ci[bot]
a2e14cf50c [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-07-26 21:08:01 +00:00
teknium
1039c3d360 improve dataloading, ctx len 2025-07-26 21:06:45 +00:00
dmahan93
6604a2255b
Merge pull request #195 from interstellarninja/feat/interleaved_tool_use
Interleaved Tool-Use Within Reasoning Blocks
2025-07-24 08:58:00 -05:00