Diplomacy trainer env (#227)

* minimal implementation, simplified challenge registry * need game save logic * fixed challenge gen, works with local test * updated challenge gen with wider ranges, working with local script * runs working correctly, wandb stats look ok * linting * Add diplomacy environment with AI_Diplomacy submodule - Add diplomacy_env_minimal.py for diplomacy game environment - Add atropos_client_minimal.py for client interface - Add diplomacy_local_server.py for local game server - Add AI_Diplomacy submodule from GoodStartLabs/AI_Diplomacy - Fix import ordering and remove unused imports * test file working, moving to cluster to test training * updated gitignore * removed logs * minor fixes, training running now * readded proxy reg and queue system * linting * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * queue gameid bug, refactored * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleaned up configs & allowed for openrouter models to be easily used * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * linting * Remove duplicate dependencies from diplomacy requirements.txt Only keep AI_Diplomacy-specific dependencies that aren't already in the main project --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-22 16:48:57 +00:00 · 2025-08-12 09:02:16 +10:00 · 2025-08-12 09:02:16 +10:00 · 46f0602227
commit 46f0602227
parent 4fe67e698d
13 changed files with 1317 additions and 4 deletions
--- a/environments/game_environments/diplomacy_environment/README.md
+++ b/environments/game_environments/diplomacy_environment/README.md
@ -0,0 +1,54 @@
+# Minimal Diplomacy Environment
+
+A simplified Diplomacy RL training environment for Atropos that integrates with AI_Diplomacy.
+
+## Overview
+
+This minimal implementation provides:
+- Basic game integration via AI_Diplomacy submodule
+- Parallel rollouts with configurable group_size
+- LLM request interception through AtroposClient proxy
+- Simple supply center based scoring
+- No complex features (no GRPO, memory systems, or advanced scoring)
+
+## Architecture
+
+```
+Atropos Policy Server
+        ↓
+AtroposClientMinimal (proxy)
+        ↓
+AI_Diplomacy Game Engine
+        ↓
+Game Execution
+```
+
+## Quick Start
+
+1. Install dependencies:
+```bash
+pip install -r requirements.txt
+cd AI_Diplomacy
+pip install -e .
+```
+
+2. Start your Atropos policy server on port 8000
+
+3. Run the environment:
+```bash
+python diplomacy_env_minimal.py serve
+```
+
+## Configuration
+
+Key settings in `DiplomacyEnvMinimalConfig`:
+- `max_game_turns`: Number of game turns (default: 10)
+- `training_power`: Which power the RL agent controls (default: "FRANCE")
+- `group_size`: Number of parallel games per trajectory (default: 4)
+
+## How It Works
+
+1. **Parallel Rollouts**: Each training step runs `group_size` games with the same initial seed
+2. **LLM Interception**: AtroposClientMinimal intercepts all LLM calls from AI_Diplomacy
+3. **Trajectory Collection**: Game interactions are collected and scored
+4. **Best Selection**: The highest scoring trajectory is returned for training