diff --git a/README.md b/README.md index 5859ff9..250399a 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,5 @@ # AI Diplomacy: LLM-Powered Strategic Gameplay + Created by Alex Duffy @Alx-Ai & Tyler Marques @Tylermarques ## Overview @@ -8,31 +9,37 @@ This repository extends the original [Diplomacy](https://github.com/diplomacy/di ## Key Features ### 🤖 Stateful AI Agents + Each power is represented by a `DiplomacyAgent` with: + - **Dynamic Goals**: Strategic objectives that evolve based on game events - **Relationship Tracking**: Maintains relationships (Enemy/Unfriendly/Neutral/Friendly/Ally) with other powers - **Memory System**: Dual-layer memory with structured diary entries and consolidation - **Personality**: Power-specific system prompts shape each agent's diplomatic style ### 💬 Rich Negotiations + - Multi-round message exchanges (private and global) - Relationship-aware communication strategies - Message history tracking and analysis - Detection of ignored messages and non-responsive powers ### 🎯 Strategic Order Generation + - BFS pathfinding for movement analysis - Context-aware order selection with nearest threats/opportunities - Fallback logic for robustness - Support for multiple LLM providers (OpenAI, Claude, Gemini, DeepSeek, OpenRouter) ### 📊 Advanced Game Analysis + - Custom phase summaries with success/failure categorization - Betrayal detection through order/negotiation comparison - Strategic planning phases for high-level directives - Comprehensive logging of all LLM interactions ### 🧠Memory Management + - **Private Diary**: Structured, phase-prefixed entries for LLM context - Negotiation summaries with relationship updates - Order reasoning and strategic justifications @@ -219,6 +226,7 @@ graph TB #### Prompt Templates The `ai_diplomacy/prompts/` directory contains customizable templates: + - Power-specific system prompts (e.g., `france_system_prompt.txt`) - Task-specific instructions (`order_instructions.txt`, `conversation_instructions.txt`) - Diary generation prompts for different game events @@ -321,10 +329,10 @@ python3 experiment_runner.py \ *(All other command-line flags belong to `lm_game.py` and are forwarded unchanged.)* - ### Environment Setup Create a `.env` file with your API keys: + ``` OPENAI_API_KEY=your_key_here ANTHROPIC_API_KEY=your_key_here @@ -336,6 +344,7 @@ OPENROUTER_API_KEY=your_key_here ### Model Configuration Models can be assigned to powers in `ai_diplomacy/utils.py`: + ```python def assign_models_to_powers() -> Dict[str, str]: return { @@ -350,6 +359,7 @@ def assign_models_to_powers() -> Dict[str, str]: ``` Supported models include: + - OpenAI: `gpt-4o`, `gpt-4.1`, `o3`, `o4-mini` - Anthropic: `claude-3-5-sonnet-20241022`, `claude-opus-4-20250514` - Google: `gemini-2.0-flash`, `gemini-2.5-pro-preview` @@ -358,6 +368,7 @@ Supported models include: ### Game Output and Analysis Games are saved to the `results/` directory with timestamps. Each game folder contains: + - `lmvsgame.json` - Complete game data including phase summaries and agent relationships - `overview.jsonl` - Error statistics and model assignments - `game_manifesto.txt` - Strategic directives from planning phases @@ -365,6 +376,7 @@ Games are saved to the `results/` directory with timestamps. Each game folder co - `llm_responses.csv` - Complete log of all LLM interactions The game JSON includes special fields for AI analysis: + - `phase_summaries` - Categorized move results for each phase - `agent_relationships` - Diplomatic standings at each phase - `final_agent_states` - End-game goals and relationships @@ -373,29 +385,32 @@ The game JSON includes special fields for AI analysis: For detailed analysis of LLM interactions and order success rates, a two-step pipeline is used: -1. **Convert CSV to RL JSON**: +1. **Convert CSV to RL JSON**: The `csv_to_rl_json.py` script processes `llm_responses.csv` files, typically found in game-specific subdirectories ending with "FULL_GAME" (e.g., `results/20250524_..._FULL_GAME/`). It converts this raw interaction data into a JSON format suitable for Reinforcement Learning (RL) analysis. To process all relevant CSVs in batch: + ```bash python csv_to_rl_json.py --scan_dir results/ ``` + This command scans the `results/` directory for "FULL_GAME" subfolders, converts their `llm_responses.csv` files, and outputs all generated `*_rl.json` files into the `results/json/` directory. -2. **Analyze RL JSON Files**: +2. **Analyze RL JSON Files**: The `analyze_rl_json.py` script then analyzes the JSON files generated in the previous step. It aggregates statistics on successful and failed convoy and support orders, categorized by model. To run the analysis: + ```bash python analyze_rl_json.py results/json/ ``` + This command processes all `*_rl.json` files in the `results/json/` directory and generates two reports in the project's root directory: - `analysis_summary.txt`: A clean summary of order statistics. - `analysis_summary_debug.txt`: A detailed report including unique 'success' field values and other debug information. This pipeline allows for a comprehensive understanding of LLM performance in generating valid and successful game orders. - ### Post-Game Analysis Tools #### Strategic Moment Analysis @@ -414,6 +429,7 @@ python analyze_game_moments.py results/game_folder --model claude-3-5-sonnet-202 ``` The analysis identifies: + - **Betrayals**: When powers explicitly promise one action but take contradictory action - **Collaborations**: Successfully coordinated actions between powers - **Playing Both Sides**: Powers making conflicting promises to different parties @@ -421,6 +437,7 @@ The analysis identifies: - **Strategic Blunders**: Major mistakes that significantly weaken a position Analysis outputs include: + - **Markdown Report** (`game_moments/[game]_report_[timestamp].md`) - AI-generated narrative of the entire game - Summary statistics (betrayals, collaborations, etc.) @@ -433,6 +450,7 @@ Analysis outputs include: - Raw lie detection data for further analysis Example output snippet: + ```markdown ## Power Models - **TURKEY**: o3 @@ -452,11 +470,13 @@ Example output snippet: #### Diplomatic Lie Detection The analysis system can detect lies by comparing: + 1. **Messages**: What powers promise to each other 2. **Private Diaries**: What powers privately plan (from negotiation_diary entries) 3. **Actual Orders**: What they actually do Lies are classified as: + - **Intentional**: Diary shows planned deception (e.g., "mislead them", "while actually...") - **Unintentional**: No evidence of planned deception (likely misunderstandings) @@ -475,6 +495,7 @@ npm run dev ``` Features: + - 3D map with unit movements and battles - Phase-by-phase playback controls - Chat window showing diplomatic messages @@ -486,11 +507,13 @@ Features: Analysis of hundreds of AI games reveals interesting patterns: #### Model Performance Characteristics + - **Invalid Move Rates**: Some models (e.g., o3) generate more invalid moves but play aggressively - **Deception Patterns**: Models vary dramatically in honesty (0-100% intentional lie rates) - **Strategic Styles**: From defensive/honest to aggressive/deceptive playstyles #### Common Strategic Patterns + - **Opening Gambits**: RT Juggernaut (Russia-Turkey), Western Triple, Lepanto - **Mid-game Dynamics**: Stab timing, alliance shifts, convoy operations - **Endgame Challenges**: Stalemate lines, forced draws, kingmaking @@ -508,7 +531,6 @@ Analysis of hundreds of AI games reveals interesting patterns: --- -