atropos/environments/game_environments/diplomacy_environment/README.md
shannonsands 46f0602227
Diplomacy trainer env (#227)
* minimal implementation, simplified challenge registry

* need game save logic

* fixed challenge gen, works with local test

* updated challenge gen with wider ranges, working with local script

* runs working correctly, wandb stats look ok

* linting

* Add diplomacy environment with AI_Diplomacy submodule

- Add diplomacy_env_minimal.py for diplomacy game environment
- Add atropos_client_minimal.py for client interface
- Add diplomacy_local_server.py for local game server
- Add AI_Diplomacy submodule from GoodStartLabs/AI_Diplomacy
- Fix import ordering and remove unused imports

* test file working, moving to cluster to test training

* updated gitignore

* removed logs

* minor fixes, training running now

* readded proxy reg and queue system

* linting

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* queue gameid bug, refactored

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleaned up configs & allowed for openrouter models to be easily used

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* linting

* Remove duplicate dependencies from diplomacy requirements.txt

Only keep AI_Diplomacy-specific dependencies that aren't already in the main project

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-08-12 09:02:16 +10:00

54 lines
1.4 KiB
Markdown

# Minimal Diplomacy Environment
A simplified Diplomacy RL training environment for Atropos that integrates with AI_Diplomacy.
## Overview
This minimal implementation provides:
- Basic game integration via AI_Diplomacy submodule
- Parallel rollouts with configurable group_size
- LLM request interception through AtroposClient proxy
- Simple supply center based scoring
- No complex features (no GRPO, memory systems, or advanced scoring)
## Architecture
```
Atropos Policy Server
AtroposClientMinimal (proxy)
AI_Diplomacy Game Engine
Game Execution
```
## Quick Start
1. Install dependencies:
```bash
pip install -r requirements.txt
cd AI_Diplomacy
pip install -e .
```
2. Start your Atropos policy server on port 8000
3. Run the environment:
```bash
python diplomacy_env_minimal.py serve
```
## Configuration
Key settings in `DiplomacyEnvMinimalConfig`:
- `max_game_turns`: Number of game turns (default: 10)
- `training_power`: Which power the RL agent controls (default: "FRANCE")
- `group_size`: Number of parallel games per trajectory (default: 4)
## How It Works
1. **Parallel Rollouts**: Each training step runs `group_size` games with the same initial seed
2. **LLM Interception**: AtroposClientMinimal intercepts all LLM calls from AI_Diplomacy
3. **Trajectory Collection**: Game interactions are collected and scored
4. **Best Selection**: The highest scoring trajectory is returned for training