Commit graph

36 commits

Author SHA1 Message Date
pre-commit-ci[bot]
a930d3db12 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-02-21 19:25:14 +00:00
VolodymyrBg
7e5ddbce06
fix: add try/finally to guarantee gym environment cleanup 2026-02-21 21:23:46 +02:00
pre-commit-ci[bot]
34cabbb30f [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-09-15 16:41:26 +00:00
shannonsands
46f0602227
Diplomacy trainer env (#227)
* minimal implementation, simplified challenge registry

* need game save logic

* fixed challenge gen, works with local test

* updated challenge gen with wider ranges, working with local script

* runs working correctly, wandb stats look ok

* linting

* Add diplomacy environment with AI_Diplomacy submodule

- Add diplomacy_env_minimal.py for diplomacy game environment
- Add atropos_client_minimal.py for client interface
- Add diplomacy_local_server.py for local game server
- Add AI_Diplomacy submodule from GoodStartLabs/AI_Diplomacy
- Fix import ordering and remove unused imports

* test file working, moving to cluster to test training

* updated gitignore

* removed logs

* minor fixes, training running now

* readded proxy reg and queue system

* linting

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* queue gameid bug, refactored

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleaned up configs & allowed for openrouter models to be easily used

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* linting

* Remove duplicate dependencies from diplomacy requirements.txt

Only keep AI_Diplomacy-specific dependencies that aren't already in the main project

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-08-12 09:02:16 +10:00
shannonsands
47cb15745c
Textworld minimal (#225)
* minimal implementation, simplified challenge registry

* need game save logic

* fixed challenge gen, works with local test

* updated challenge gen with wider ranges, working with local script

* runs working correctly, wandb stats look ok

* linting

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unused imports

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-08-01 10:16:35 +10:00
fuder.eth
1862b193ee
Update README.md (#118) 2025-05-28 10:24:12 +10:00
Shannon Sands
bfb822f1e0 updated APIServerConfig and added requirements.txt and install instructions to README 2025-05-15 12:22:00 -07:00
Shannon Sands
00dd120067 Merge branch 'main' into blackjack2-env 2025-05-14 17:27:44 -07:00
Shannon Sands
8fad665f6a moved folder location 2025-05-14 17:22:30 -07:00
Shannon Sands
c2bf3f5acd moved folder location 2025-05-14 17:22:18 -07:00
Shannon Sands
3fba8e3527 linting 2025-05-14 14:22:25 -07:00
Shannon Sands
d8ab1a6758 linting 2025-05-14 14:20:54 -07:00
Shannon Sands
1a7c0294fa refactoring for more clarity 2025-05-14 14:18:43 -07:00
Shannon Sands
bb6c205efe Linting 2025-05-14 14:05:52 -07:00
Shannon Sands
67cfd961c5 linting 2025-05-14 14:01:31 -07:00
Shannon Sands
826de9e283 Updated README 2025-05-14 13:57:20 -07:00
Shannon Sands
f5172b45a8 Added README 2025-05-14 13:35:15 -07:00
Shannon Sands
85f462df5e Updated test scripts 2025-05-14 12:05:59 -07:00
Shannon Sands
d6f9d58606 new env runs locally 2025-05-14 11:57:45 -07:00
Shannon Sands
54ae40840d no-thinking env added 2025-05-14 11:28:39 -07:00
Shannon Sands
21cc528b85 move best-of-n selection to util 2025-05-14 10:35:12 -07:00
Shannon Sands
4c00e2b209 move message history out to utils 2025-05-14 10:13:56 -07:00
Shannon Sands
8cd9e4d776 made private collect_trajectory re changes 2025-05-13 07:58:48 +10:00
Shannon Sands
e480c30b8b removed new fn 2025-05-13 07:49:28 +10:00
Shannon Sands
220b92be47 Linting and cleanup 2025-05-10 21:15:00 +10:00
Shannon Sands
6617d402b3 Doing exact V* calc 2025-05-10 20:24:31 +10:00
Shannon Sands
a049dde6b1 Adding thinking reward 2025-05-10 19:50:30 +10:00
Shannon Sands
840ff20921 Fixed typo, revising reward function 2025-05-10 19:45:06 +10:00
Shannon Sands
7fe1a40368 readd multistep masking 2025-05-10 09:24:55 +10:00
Shannon Sands
9efd8c1529 linting 2025-05-10 08:44:35 +10:00
Shannon Sands
06c4a9e65c linting 2025-05-10 08:43:03 +10:00
Shannon Sands
0248cc1227 Removed old code, added comments 2025-05-10 08:39:52 +10:00
Shannon Sands
ba604d44f9 update local server 2025-05-10 08:18:41 +10:00
Shannon Sands
c506bb147e simplified config and reward 2025-05-10 08:04:39 +10:00
Shannon Sands
7e95c0b67d moving test sever 2025-05-10 07:47:44 +10:00
Shannon Sands
a7dfd377da moving env to clean branch 2025-05-10 07:44:29 +10:00