atropos/environments/game_environments/diplomacy_environment
2025-09-15 16:41:26 +00:00
..
AI_Diplomacy@70d4ae2fe0 Diplomacy trainer env (#227) 2025-08-12 09:02:16 +10:00
__init__.py Diplomacy trainer env (#227) 2025-08-12 09:02:16 +10:00
atropos_client_minimal.py [pre-commit.ci] auto fixes from pre-commit.com hooks 2025-09-15 16:41:26 +00:00
diplomacy_env_minimal.py Diplomacy trainer env (#227) 2025-08-12 09:02:16 +10:00
diplomacy_local_server.py Diplomacy trainer env (#227) 2025-08-12 09:02:16 +10:00
queue_manager.py Diplomacy trainer env (#227) 2025-08-12 09:02:16 +10:00
README.md Diplomacy trainer env (#227) 2025-08-12 09:02:16 +10:00
requirements.txt Diplomacy trainer env (#227) 2025-08-12 09:02:16 +10:00

Minimal Diplomacy Environment

A simplified Diplomacy RL training environment for Atropos that integrates with AI_Diplomacy.

Overview

This minimal implementation provides:

  • Basic game integration via AI_Diplomacy submodule
  • Parallel rollouts with configurable group_size
  • LLM request interception through AtroposClient proxy
  • Simple supply center based scoring
  • No complex features (no GRPO, memory systems, or advanced scoring)

Architecture

Atropos Policy Server
        ↓
AtroposClientMinimal (proxy)
        ↓
AI_Diplomacy Game Engine
        ↓
Game Execution

Quick Start

  1. Install dependencies:
pip install -r requirements.txt
cd AI_Diplomacy
pip install -e .
  1. Start your Atropos policy server on port 8000

  2. Run the environment:

python diplomacy_env_minimal.py serve

Configuration

Key settings in DiplomacyEnvMinimalConfig:

  • max_game_turns: Number of game turns (default: 10)
  • training_power: Which power the RL agent controls (default: "FRANCE")
  • group_size: Number of parallel games per trajectory (default: 4)

How It Works

  1. Parallel Rollouts: Each training step runs group_size games with the same initial seed
  2. LLM Interception: AtroposClientMinimal intercepts all LLM calls from AI_Diplomacy
  3. Trajectory Collection: Game interactions are collected and scored
  4. Best Selection: The highest scoring trajectory is returned for training