diff --git a/CLAUDE.md b/CLAUDE.md deleted file mode 100644 index 3215a33..0000000 --- a/CLAUDE.md +++ /dev/null @@ -1,21 +0,0 @@ -# AI Diplomacy Development Guide - -## Commands -- Run game: `python lm_game.py --max_year 1910 --summary_model "gpt-4o-mini" --num_negotiation_rounds 3` -- Run tests: `pytest -v diplomacy/tests/` or `pytest -v -k test_name` -- Run specific test: `pytest -v diplomacy/tests/path_to_test.py::test_function` -- Lint: `pylint diplomacy/path/to/file.py` -- Full test suite: `./diplomacy/run_tests.sh` - -## Code Style -- Use Python type hints for function parameters and return values -- Follow PEP 8 naming: snake_case for functions/variables, UPPER_CASE for constants -- Organize imports: standard library, third-party, local modules -- Error handling: Use specific exceptions with informative messages -- Docstrings: Use multi-line docstrings with parameter descriptions -- Keep functions focused on a single responsibility -- Models/LLM clients inherit from BaseModelClient and implement required methods -- When possible, use concurrent operations (see concurrent.futures in lm_game.py) - -## Environment -Python 3.5+ required. Use virtual environment with requirements.txt. \ No newline at end of file diff --git a/README.md b/README.md index baa1958..9e867e3 100644 --- a/README.md +++ b/README.md @@ -1,94 +1,38 @@ -# AI Diplomacy +# AI Diplomacy: -[![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0) +## Extended AI Features (Experimental) -## AI-Powered Diplomacy with LLMs +This repository is an extension of the original [Diplomacy](https://github.com/diplomacy/diplomacy) project. This repository has been extended to integrate multiple Large Language Models (LLMs) into Diplomacy gameplay. **These extensions are experimental, subject to change**, and actively in development. The main additions are as follows: -This repository extends the original [Diplomacy](https://github.com/diplomacy/diplomacy) project to create a completely LLM-powered version of the classic board game Diplomacy. Each power is controlled by an LLM that can negotiate, form alliances, and plan strategies across multiple game phases. +- **Conversation & Negotiation**: Powers can have multi-turn negotiations with each other via `lm_game.py`. They can exchange private or global messages, allowing for more interactive diplomacy. +- **Order Generation**: Each power can choose its orders (moves, holds, supports, etc.) using LLMs via `lm_service_versus.py`. Currently supports OpenAI, Claude, Gemini, DeepSeek +- **Phase Summaries**: Modifications in the `game.py` engine allow the generation of "phase summaries," providing a succinct recap of each turn's events. This could help both human spectators and the LLMs themselves to understand the game state more easily. +- **Prompt Templates**: Prompts used by the LLMs are stored in `/prompts/`. You can edit these to customize how models are instructed for both orders and conversations. +- **Experimental & WIP**: Ongoing development includes adding strategic goals for each power, more flexible conversation lengths, and a readiness check to advance the phase if all powers are done negotiating. -## Features +### How it Works -- **Strategic LLM Agents**: Each country is controlled by an LLM with specialized prompting tailored to their unique geopolitical position -- **Multi-Turn Negotiations**: Powers engage in diplomatic exchanges through global and private messages -- **Automatic Order Generation**: Each power autonomously determines orders based on game state and diplomatic history -- **Context Management**: Intelligent summarization of game state and message history to optimize context windows -- **Enhanced Logging**: Structured, detailed logging for analysis of AI reasoning and decision-making -- **Multi-Model Support**: Compatible with OpenAI, Anthropic Claude, Gemini, and DeepSeek models -- **Power-Specific Strategy**: Country-specific system prompts that provide strategic guidance based on historical Diplomacy strategy +1. **`lm_game.py`** + - Orchestrates a Diplomacy game where each power's moves are decided by an LLM. + - Manages conversation rounds (currently up to 3 by default) and calls `get_conversation_reply()` for each power. + - After negotiations, each power's orders are gathered concurrently (via threads), using `get_orders()` from the respective LLM client. + - Calls `game.process()` to move to the next phase, optionally collecting phase summaries along the way. -## Getting Started +2. **`lm_service_versus.py`** + - Defines a base class (`BaseModelClient`) for hitting any LLM endpoint. + - Subclasses (`OpenAIClient`, `ClaudeClient`, etc.) implement `generate_response()` and `get_conversation_reply()` with the specifics of each LLM's API. + - Handles prompt construction for orders and conversation, JSON extraction to parse moves or messages, and fallback logic for invalid LLM responses. -```bash -# Clone the repository -git clone https://github.com/username/AI_Diplomacy.git -cd AI_Diplomacy +3. **Modifications in `game.py` (Engine)** + - Added a `_generate_phase_summary()` method and `phase_summaries` dict to store short textual recaps of each phase. + - Summaries can be viewed or repurposed for real-time commentary or as additional context fed back into the LLM. -# Install dependencies -pip install -r requirements.txt +### Future Explorations -# Run a game -python lm_game.py --max_year 1910 --summary_model "gpt-4o-mini" --num_negotiation_rounds 3 -``` - -## Command Line Options - -The main game script supports various configuration options: - -``` -python lm_game.py [options] - -Options: - --max_year INTEGER Maximum year to simulate (default: 1910) - --summary_model STRING Model to use for phase summarization (default: "gpt-4o-mini") - --num_negotiation_rounds Number of message rounds per phase (default: 3) - --models STRING Comma-separated list of models to use for each power - --log_full_prompts Log complete prompts sent to models - --log_full_responses Log complete responses from models - --verbose Enable verbose logging including connection details - --log_level STRING Set logging level (DEBUG, INFO, WARNING, ERROR) -``` - -## How It Works - -1. **Game Initialization** - - Creates a standard Diplomacy game and assigns an LLM to each power - - Initializes logging and context management systems - -2. **Diplomacy Phases** - - For each movement phase, powers engage in negotiation rounds - - Each power analyzes game state, diplomatic history, and strategic position - - Powers autonomously generate orders through concurrent execution - -3. **Context Management** - - Game history and diplomatic exchanges are intelligently summarized - - Recursive summarization optimizes context windows while preserving crucial information - - System provides each power with relevant, concise context to make decisions - -4. **Order Processing** - - Orders from all powers are collected and processed by the game engine - - Phase summaries are generated to capture key events - - Results are saved and the game advances to the next phase - -## Project Structure - -- **ai_diplomacy/**: Core extension code - - **clients.py**: Model client implementations for different LLM providers - - **game_history.py**: Tracks and manages game history for LLM context - - **long_story_short.py**: Context optimization and summarization - - **negotiations.py**: Handles diplomatic exchanges between powers - - **prompts/**: Templates for system instructions, orders, and negotiations - - **utils.py**: Helper functions for game state analysis and formatting - -- **lm_game.py**: Main game runner script -- **diplomacy/**: Original game engine (with minor extensions) - -## Recent Improvements - -- **Enhanced Structured Logging**: Added structured logging with context tags for better debugging and analysis -- **Optimized Context Management**: Rewrote the recursive summarization system to handle model context more efficiently -- **Improved Power-Specific Prompts**: Updated all country-specific system prompts with more strategic guidance -- **Better Convoy and Threat Analysis**: Enhanced convoy path detection and threat assessment -- **Command Line Options**: Added configuration options for logging verbosity and model selection +- **Longer Conversation Phases**: Support for more than 3 message rounds, or an adaptive approach that ends negotiation early if all powers signal "ready." +- **Strategic Goals**: Let each power maintain high-level goals (e.g., "ally with France," "defend Munich") that the LLM takes into account for orders and conversations. +- **Enhanced Summaries**: Summaries could incorporate conversation logs or trending alliances, giving the LLM even richer context each turn. +- **Live Front-End Integration**: Display phase summaries, conversation logs, and highlights of completed orders in a real-time UI. (an attempt to display phase summaries currently in progress) --- diff --git a/ai_diplomacy/clients.py b/ai_diplomacy/clients.py index 87d1b37..56f8af2 100644 --- a/ai_diplomacy/clients.py +++ b/ai_diplomacy/clients.py @@ -11,6 +11,7 @@ from dotenv import load_dotenv import anthropic os.environ["GRPC_PYTHON_LOG_LEVEL"] = "10" +import google.generativeai as genai # Import after setting log level from openai import OpenAI as DeepSeekOpenAI from openai import OpenAI from anthropic import Anthropic @@ -19,92 +20,12 @@ from google import genai from diplomacy.engine.message import GLOBAL from .game_history import GameHistory -from .long_story_short import get_optimized_context -from .model_loader import load_model_client -# Configure logger with a more useful format +# set logger back to just info logger = logging.getLogger("client") logger.setLevel(logging.DEBUG) logging.basicConfig(level=logging.DEBUG) -# Function to configure logging options -def configure_logging( - log_full_prompts=True, - log_full_responses=True, - suppress_connection_logs=True, - log_level=logging.INFO -): - """ - Configure the logging system for AI Diplomacy - - Parameters: - - log_full_prompts: Whether to log the full prompts sent to models - - log_full_responses: Whether to log the full responses from models - - suppress_connection_logs: Whether to suppress HTTP connection logs - - log_level: The overall logging level for the application - """ - # Configure root logger - logging.getLogger().setLevel(log_level) - - # Set client logger level - logger.setLevel(log_level) - - # Configure specific loggers based on parameters - if suppress_connection_logs: - logging.getLogger("httpx").setLevel(logging.WARNING) - logging.getLogger("httpcore").setLevel(logging.WARNING) - logging.getLogger("urllib3").setLevel(logging.WARNING) - logging.getLogger("anthropic").setLevel(logging.WARNING) - logging.getLogger("openai").setLevel(logging.WARNING) - - # Set module-level configuration - global SHOULD_LOG_FULL_PROMPTS, SHOULD_LOG_FULL_RESPONSES - SHOULD_LOG_FULL_PROMPTS = log_full_prompts - SHOULD_LOG_FULL_RESPONSES = log_full_responses - - logger.info(f"Logging configured: full_prompts={log_full_prompts}, full_responses={log_full_responses}, level={logging.getLevelName(log_level)}") - -# Initialize defaults -SHOULD_LOG_FULL_PROMPTS = True -SHOULD_LOG_FULL_RESPONSES = True - -# Helper function for truncating long outputs in logs -def _truncate_text(text, max_length=500): - """Truncate text for logging purposes with indicator of original length""" - if not text or len(text) <= max_length: - return text - return f"{text[:max_length]}... [truncated, total length: {len(text)} chars]" - -# Helper function to log full model responses -def _log_full_response(model_type, model_name, power_name, response): - """Logs the full model response at INFO level""" - if not response or not SHOULD_LOG_FULL_RESPONSES: - return - - border = "=" * 80 - logger.info(f"\nMODEL RESPONSE | {model_type} | {model_name} | {power_name or 'Unknown'}\n{border}") - logger.info(f"{response}") - logger.info(f"{border}\n") - -# Helper function to log prompt details -def _log_prompt_details(model_type, model_name, power_name, prompt, system_prompt=None): - """Logs the prompt details at INFO level""" - if not prompt or not SHOULD_LOG_FULL_PROMPTS: - return - - border = "=" * 80 - total_tokens = len(prompt.split()) - - if system_prompt: - system_tokens = len(system_prompt.split()) - logger.info(f"\nPROMPT | {model_type} | {model_name} | {power_name or 'Unknown'} | ~{total_tokens} tokens (user) + ~{system_tokens} tokens (system)\n{border}") - logger.debug(f"System prompt: {_truncate_text(system_prompt)}") - else: - logger.info(f"\nPROMPT | {model_type} | {model_name} | {power_name or 'Unknown'} | ~{total_tokens} tokens\n{border}") - - logger.debug(f"User prompt: {_truncate_text(prompt)}") - logger.info(f"{border}\n") - load_dotenv() @@ -115,31 +36,16 @@ class BaseModelClient: """ Base interface for any LLM client we want to plug in. Each must provide: - - generate_response(prompt: str) -> str (with empty_system=True if needed) - - get_orders(board_state, power_name, possible_orders, game_history, phase_summaries) -> List[str] + - generate_response(prompt: str) -> str + - get_orders(board_state, power_name, possible_orders) -> List[str] - get_conversation_reply(power_name, conversation_so_far, game_phase) -> str """ - def __init__(self, model_name: str, power_name: Optional[str] = None, emptysystem: bool = False): + def __init__(self, model_name: str): self.model_name = model_name - self.power_name = power_name - self.emptysystem = emptysystem + self.system_prompt = load_prompt("system_prompt.txt") - # Conditionally load system prompt - if not self.emptysystem: - if self.power_name: - try: - self.system_prompt = load_prompt(f"{self.power_name.lower()}_system_prompt.txt") - except FileNotFoundError: - logger.warning(f"CONFIG | {self.model_name} | No specific system prompt for {self.power_name}, using default") - self.system_prompt = load_prompt("system_prompt.txt") - else: - self.system_prompt = load_prompt("system_prompt.txt") - else: - # If emptysystem is True, skip loading any system prompt - self.system_prompt = "" - # emptysystem defaults to false but if true will tell the LLM to not add a system prompt - def generate_response(self, prompt: str, empty_system: bool = False) -> str: + def generate_response(self, prompt: str) -> str: """ Returns a raw string from the LLM. Subclasses override this. @@ -193,19 +99,22 @@ class BaseModelClient: # Load in current context values context = context.format( power_name=power_name, - phase_expanded=phase_expanded, - our_forces_summary=our_forces_summary, - neutral_supply_centers_summary=neutral_supply_centers_summary, - enemies_forces_summary=enemies_forces_summary, - history_text=history_text, - possible_orders_text=possible_orders_text, - convoy_paths_text=convoy_paths_text, - threat_text=threat_text, - sc_projection_text=sc_projection_text, - historical_summaries=historical_summaries, + current_phase=year_phase, + game_map_loc_name=game.map.loc_name, + game_map_loc_type=game.map.loc_type, + map_as_adjacency_list=game.map.loc_abut, + possible_coasts=game.map.loc_coasts, + game_map_scs=game.map.scs, + game_history=conversation_text, + enemy_units=enemy_units, + enemy_centers=enemy_centers, + units_info=units_info, + centers_info=centers_info, + possible_orders=possible_orders_str, + convoy_paths_possible=convoy_paths_possible, ) - return final_prompt + return context def build_prompt( self, @@ -214,7 +123,6 @@ class BaseModelClient: power_name: str, possible_orders: Dict[str, List[str]], game_history: GameHistory, - phase_summaries: Optional[Dict[str, str]] = None, ) -> str: """ Unified prompt approach: incorporate conversation and 'PARSABLE OUTPUT' requirements. @@ -230,7 +138,6 @@ class BaseModelClient: power_name, possible_orders, game_history, - phase_summaries, ) return context + "\n\n" + instructions @@ -242,8 +149,7 @@ class BaseModelClient: power_name: str, possible_orders: Dict[str, List[str]], conversation_text: str, - phase_summaries: Optional[Dict[str, str]] = None, - model_error_stats=None, + model_error_stats=None, # New optional param ) -> List[str]: """ 1) Builds the prompt with conversation context if available @@ -256,15 +162,14 @@ class BaseModelClient: power_name, possible_orders, conversation_text, - phase_summaries, ) raw_response = "" try: raw_response = self.generate_response(prompt) - logger.debug( - f"ORDERS | {self.model_name} | {power_name} | Raw response: {_truncate_text(raw_response)}" + logger.info( + f"[{self.model_name}] Raw LLM response for {power_name}:\n{raw_response}" ) # Attempt to parse the final "orders" from the LLM @@ -272,25 +177,17 @@ class BaseModelClient: if not move_list: logger.warning( - f"PARSE_ERROR | {self.model_name} | {power_name} | Failed to extract moves, using fallback" + f"[{self.model_name}] Could not extract moves for {power_name}. Using fallback." ) if model_error_stats is not None: - # forcibly convert sets to string - model_name_for_stats = str(self.model_name) - model_error_stats[model_name_for_stats]["order_decoding_errors"] += 1 - + model_error_stats[self.model_name]["order_decoding_errors"] += 1 return self.fallback_orders(possible_orders) # Validate or fallback validated_moves = self._validate_orders(move_list, possible_orders) return validated_moves except Exception as e: - logger.error(f"LLM_ERROR | {self.model_name} | {power_name} | {str(e)}") - if model_error_stats is not None: - # forcibly convert sets to string - model_name_for_stats = str(self.model_name) - model_error_stats[model_name_for_stats]["order_decoding_errors"] += 1 - + logger.error(f"[{self.model_name}] LLM error for {power_name}: {e}") return self.fallback_orders(possible_orders) def _extract_moves(self, raw_response: str, power_name: str) -> Optional[List[str]]: @@ -310,7 +207,7 @@ class BaseModelClient: if not matches: # Some LLMs might not put the colon or might have triple backtick fences. logger.debug( - f"PARSE | {self.model_name} | {power_name} | Regex #1 failed, trying alternative patterns" + f"[{self.model_name}] Regex parse #1 failed for {power_name}. Trying alternative patterns." ) # 1b) Check for inline JSON after "PARSABLE OUTPUT" @@ -319,42 +216,34 @@ class BaseModelClient: if not matches: logger.debug( - f"PARSE | {self.model_name} | {power_name} | Regex #2 failed, trying triple-backtick code fences" + f"[{self.model_name}] Regex parse #2 failed for {power_name}. Trying triple-backtick code fences." ) # 2) If still no match, check for triple-backtick code fences containing JSON if not matches: code_fence_pattern = r"```json\s*(\{.*?\})\s*```" matches = re.search(code_fence_pattern, raw_response, re.DOTALL) - if matches: logger.debug( - f"PARSE | {self.model_name} | {power_name} | Found triple-backtick JSON block" + f"[{self.model_name}] Found triple-backtick JSON block for {power_name}." ) # 3) Attempt to parse JSON if we found anything json_text = None if matches: - # Add braces back around the captured group as needed - captured = matches.group(1).strip() - - if captured.startswith("{{") and captured.endswith("}}"): - # remove ONE leading '{' and ONE trailing '}' - # so {{ "orders": [...] }} becomes { "orders": [...] } - logger.debug(f"PARSE | {self.model_name} | {power_name} | Detected double braces, trimming extra braces") - # strip exactly one brace pair - trimmed = captured[1:-1].strip() - json_text = trimmed - elif captured.startswith("{"): - json_text = captured + # Add braces back around the captured group + if matches.group(1).strip().startswith(r"{{"): + json_text = matches.group(1).strip()[1:-1] + elif matches.group(1).strip().startswith(r"{"): + json_text = matches.group(1).strip() else: - json_text = "{" + captured + "}" + json_text = "{%s}" % matches.group(1).strip json_text = json_text.strip() if not json_text: logger.debug( - f"PARSE | {self.model_name} | {power_name} | No JSON text found in response" + f"[{self.model_name}] No JSON text found in LLM response for {power_name}." ) return None @@ -364,7 +253,7 @@ class BaseModelClient: return data.get("orders", None) except json.JSONDecodeError as e: logger.warning( - f"PARSE | {self.model_name} | {power_name} | JSON decode failed: {str(e)[:100]}. Trying bracket fallback" + f"[{self.model_name}] JSON decode failed for {power_name}: {e}. Trying bracket fallback." ) # 3b) Attempt bracket fallback: we look for the substring after "orders" @@ -377,11 +266,10 @@ class BaseModelClient: raw_list_str = "[" + bracket_match.group(1).strip() + "]" moves = ast.literal_eval(raw_list_str) if isinstance(moves, list): - logger.debug(f"PARSE | {self.model_name} | {power_name} | Bracket fallback successful") return moves except Exception as e2: logger.warning( - f"PARSE | {self.model_name} | {power_name} | Bracket fallback failed: {str(e2)[:100]}" + f"[{self.model_name}] Bracket fallback parse also failed for {power_name}: {e2}" ) # If all attempts failed @@ -393,12 +281,12 @@ class BaseModelClient: """ Filter out invalid moves, fill missing with HOLD, else fallback. """ - logger.debug(f"VALIDATE | {self.model_name} | Validating {len(moves)} proposed moves") + logger.debug(f"[{self.model_name}] Proposed LLM moves: {moves}") validated = [] used_locs = set() if not isinstance(moves, list): - logger.debug(f"VALIDATE | {self.model_name} | Moves not a list type, using fallback") + logger.debug(f"[{self.model_name}] Moves not a list, fallback.") return self.fallback_orders(possible_orders) for move in moves: @@ -410,7 +298,7 @@ class BaseModelClient: if len(parts) >= 2: used_locs.add(parts[1][:3]) else: - logger.debug(f"VALIDATE | {self.model_name} | Invalid move: {move_str}") + logger.debug(f"[{self.model_name}] Invalid move from LLM: {move_str}") # Fill missing with hold for loc, orders_list in possible_orders.items(): @@ -421,10 +309,10 @@ class BaseModelClient: ) if not validated: - logger.warning(f"VALIDATE | {self.model_name} | All moves invalid, using fallback") + logger.warning(f"[{self.model_name}] All moves invalid, fallback.") return self.fallback_orders(possible_orders) - logger.debug(f"VALIDATE | {self.model_name} | Final valid moves: {len(validated)}") + logger.debug(f"[{self.model_name}] Validated moves: {validated}") return validated def fallback_orders(self, possible_orders: Dict[str, List[str]]) -> List[str]: @@ -469,7 +357,6 @@ class BaseModelClient: possible_orders: Dict[str, List[str]], game_history: GameHistory, game_phase: str, - phase_summaries: Optional[Dict[str, str]] = None, ) -> str: instructions = load_prompt("conversation_instructions.txt") @@ -479,7 +366,6 @@ class BaseModelClient: power_name, possible_orders, game_history, - phase_summaries, ) return context + "\n\n" + instructions @@ -493,7 +379,6 @@ class BaseModelClient: game_history: GameHistory, game_phase: str, active_powers: Optional[List[str]] = None, - phase_summaries: Optional[Dict[str, str]] = None, ) -> str: prompt = self.build_planning_prompt( @@ -503,7 +388,6 @@ class BaseModelClient: possible_orders, game_history, game_phase, - phase_summaries, ) raw_response = self.generate_response(prompt) @@ -554,7 +438,6 @@ class BaseModelClient: ) json_matches = re.findall( r"```json\n(.*?)\n```", raw_response, re.DOTALL - ) for match in json_matches: @@ -600,7 +483,6 @@ class BaseModelClient: logger.error(f"Error parsing message JSON for {power_name}: {str(e)}") continue - # Deduplicate messages messages = list(set([json.dumps(m) for m in messages])) messages = [json.loads(m) for m in messages] @@ -628,39 +510,34 @@ class OpenAIClient(BaseModelClient): For 'o3-mini', 'gpt-4o', or other OpenAI model calls. """ - def __init__(self, model_name: str, power_name: Optional[str] = None, emptysystem: bool = False): - super().__init__(model_name, power_name, emptysystem) + def __init__(self, model_name: str): + super().__init__(model_name) self.client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY")) - def generate_response(self, prompt: str, empty_system: bool = False) -> str: + def generate_response(self, prompt: str) -> str: + # Updated to new API format try: - system_content = self.system_prompt if not empty_system else "" - logger.debug(f"API | OpenAI | {self.model_name} | Sending request") - - _log_prompt_details("OpenAI", self.model_name, self.power_name, prompt, system_content) - response = self.client.chat.completions.create( model=self.model_name, messages=[ - {"role": "system", "content": system_content}, + {"role": "system", "content": self.system_prompt}, {"role": "user", "content": prompt}, ], ) - if not response or not response.choices: - logger.warning(f"API | OpenAI | {self.model_name} | Empty or invalid response") + if not response or not hasattr(response, "choices") or not response.choices: + logger.warning( + f"[{self.model_name}] Empty or invalid result in generate_response. Returning empty." + ) return "" - logger.debug(f"API | OpenAI | {self.model_name} | Received response of length {len(response.choices[0].message.content)}") - content = response.choices[0].message.content.strip() - _log_full_response("OpenAI", self.model_name, self.power_name, content) - return content + return response.choices[0].message.content.strip() except json.JSONDecodeError as json_err: logger.error( - f"API | OpenAI | {self.model_name} | JSON decode error: {str(json_err)[:100]}" + f"[{self.model_name}] JSON decoding failed in generate_response: {json_err}" ) return "" except Exception as e: logger.error( - f"API | OpenAI | {self.model_name} | Error: {str(e)[:150]}" + f"[{self.model_name}] Unexpected error in generate_response: {e}" ) return "" @@ -670,45 +547,34 @@ class ClaudeClient(BaseModelClient): For 'claude-3-5-sonnet-20241022', 'claude-3-5-haiku-20241022', etc. """ - def __init__(self, model_name: str, power_name: Optional[str] = None, emptysystem: bool = False): - super().__init__(model_name, power_name, emptysystem) + def __init__(self, model_name: str): + super().__init__(model_name) self.client = Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY")) - def generate_response(self, prompt: str, empty_system: bool = False) -> str: + def generate_response(self, prompt: str) -> str: # Updated Claude messages format try: - system_content = self.system_prompt if not empty_system else "" - - _log_prompt_details("Claude", self.model_name, self.power_name, prompt, system_content) - response = self.client.messages.create( model=self.model_name, max_tokens=2000, - system=system_content, + system=self.system_prompt, # system is now a top-level parameter messages=[{"role": "user", "content": prompt}], ) - if not response or not response.content: - logger.warning(f"API | Claude | {self.model_name} | Empty or invalid response") + if not response.content: + logger.warning( + f"[{self.model_name}] Empty content in Claude generate_response. Returning empty." + ) return "" - logger.debug(f"API | Claude | {self.model_name} | Received response of length {len(response.content)}") - - # Handle the new response format which might be a list of TextBlock objects - if isinstance(response.content, list): - # Extract text from each TextBlock - content = "" - for block in response.content: - if hasattr(block, 'text'): - content += block.text - elif isinstance(block, dict) and 'text' in block: - content += block['text'] - logger.debug(f"API | Claude | {self.model_name} | Extracted text from {len(response.content)} TextBlocks") - else: - content = response.content - - _log_full_response("Claude", self.model_name, self.power_name, content) - return content + return response.content[0].text.strip() if response.content else "" + except json.JSONDecodeError as json_err: + logger.error( + f"[{self.model_name}] JSON decoding failed in generate_response: {json_err}" + ) + return "" except Exception as e: - logger.error(f"API | Claude | {self.model_name} | Error: {str(e)[:150]}") + logger.error( + f"[{self.model_name}] Unexpected error in generate_response: {e}" + ) return "" @@ -717,29 +583,26 @@ class GeminiClient(BaseModelClient): For 'gemini-1.5-flash' or other Google Generative AI models. """ - def __init__(self, model_name: str, power_name: Optional[str] = None, emptysystem: bool = False): - super().__init__(model_name, power_name, emptysystem) + def __init__(self, model_name: str): + super().__init__(model_name) self.client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY")) - def generate_response(self, prompt: str, empty_system: bool = False) -> str: + def generate_response(self, prompt: str) -> str: + full_prompt = self.system_prompt + prompt + try: - system_content = self.system_prompt if not empty_system else "" - logger.debug(f"API | Gemini | {self.model_name} | Sending request") response = self.client.models.generate_content( model=self.model_name, - contents=system_content + prompt, + contents=full_prompt, ) if not response or not response.text: logger.warning( - f"API | Gemini | {self.model_name} | Empty response" + f"[{self.model_name}] Empty Gemini generate_response. Returning empty." ) return "" - logger.debug(f"API | Gemini | {self.model_name} | Received response of length {len(response.text)}") - content = response.text.strip() - _log_full_response("Gemini", self.model_name, self.power_name, content) - return content + return response.text.strip() except Exception as e: - logger.error(f"API | Gemini | {self.model_name} | Error: {str(e)[:150]}") + logger.error(f"[{self.model_name}] Error in Gemini generate_response: {e}") return "" @@ -748,43 +611,36 @@ class DeepSeekClient(BaseModelClient): For DeepSeek R1 'deepseek-reasoner' """ - def __init__(self, model_name: str, power_name: Optional[str] = None, emptysystem: bool = False): - super().__init__(model_name, power_name, emptysystem) + def __init__(self, model_name: str): + super().__init__(model_name) self.api_key = os.environ.get("DEEPSEEK_API_KEY") self.client = DeepSeekOpenAI( api_key=self.api_key, base_url="https://api.deepseek.com/" ) - def generate_response(self, prompt: str, empty_system: bool = False) -> str: + def generate_response(self, prompt: str) -> str: try: - system_content = self.system_prompt if not empty_system else "" - logger.debug(f"API | DeepSeek | {self.model_name} | Sending request") - - _log_prompt_details("DeepSeek", self.model_name, self.power_name, prompt, system_content) - response = self.client.chat.completions.create( model=self.model_name, messages=[ - {"role": "system", "content": system_content}, + {"role": "system", "content": self.system_prompt}, {"role": "user", "content": prompt}, ], stream=False, ) - logger.debug(f"API | DeepSeek | {self.model_name} | Received response") + logger.debug(f"[{self.model_name}] Raw DeepSeek response:\n{response}") if not response or not response.choices: logger.warning( - f"API | DeepSeek | {self.model_name} | No valid response" + f"[{self.model_name}] No valid response in generate_response." ) return "" content = response.choices[0].message.content.strip() if not content: - logger.warning(f"API | DeepSeek | {self.model_name} | Empty content") + logger.warning(f"[{self.model_name}] DeepSeek returned empty content.") return "" - - _log_full_response("DeepSeek", self.model_name, self.power_name, content) - + try: json_response = json.loads(content) required_fields = ["message_type", "content"] @@ -792,13 +648,13 @@ class DeepSeekClient(BaseModelClient): required_fields.append("recipient") if not all(field in json_response for field in required_fields): logger.error( - f"API | DeepSeek | {self.model_name} | Missing fields: {_truncate_text(content, 100)}" + f"[{self.model_name}] Missing required fields in response: {content}" ) return "" return content except JSONDecodeError: logger.error( - f"API | DeepSeek | {self.model_name} | Invalid JSON: {_truncate_text(content, 100)}" + f"[{self.model_name}] Response is not valid JSON: {content}" ) content = content.replace("'", '"') try: @@ -809,11 +665,35 @@ class DeepSeekClient(BaseModelClient): except Exception as e: logger.error( - f"API | DeepSeek | {self.model_name} | Error: {str(e)[:150]}" + f"[{self.model_name}] Unexpected error in generate_response: {e}" ) return "" +############################################################################## +# 3) Factory to Load Model Client +############################################################################## + + +def load_model_client(model_id: str) -> BaseModelClient: + """ + Returns the appropriate LLM client for a given model_id string. + Example usage: + client = load_model_client("claude-3-5-sonnet-20241022") + """ + # Basic pattern matching or direct mapping + lower_id = model_id.lower() + if "claude" in lower_id: + return ClaudeClient(model_id) + elif "gemini" in lower_id: + return GeminiClient(model_id) + elif "deepseek" in lower_id: + return DeepSeekClient(model_id) + else: + # Default to OpenAI + return OpenAIClient(model_id) + + ############################################################################## # 4) Example Usage in a Diplomacy "main" or Similar ############################################################################## @@ -833,7 +713,7 @@ def example_game_loop(game): for power_name, power_obj in active_powers: model_id = power_model_mapping.get(power_name, "o3-mini") - client = load_model_client(model_id, power_name=power_name) + client = load_model_client(model_id) # Get possible orders from the game possible_orders = game.get_all_possible_orders() @@ -854,11 +734,11 @@ class LMServiceVersus: """ def __init__(self): - self.power_model_map = assign_models_to_powers(randomize=True) + self.power_model_map = assign_models_to_powers() def get_orders_for_power(self, game, power_name): model_id = self.power_model_map.get(power_name, "o3-mini") - client = load_model_client(model_id, power_name=power_name) + client = load_model_client(model_id) possible_orders = gather_possible_orders(game, power_name) board_state = game.get_state() return client.get_orders(board_state, power_name, possible_orders) diff --git a/ai_diplomacy/game_history.py b/ai_diplomacy/game_history.py index a92421c..f0c2d5c 100644 --- a/ai_diplomacy/game_history.py +++ b/ai_diplomacy/game_history.py @@ -117,62 +117,25 @@ class GameHistory: def get_game_history(self, power_name: str, include_plans: bool = True, num_prev_phases: int = 5) -> str: if not self.phases: - logger.debug(f"HISTORY | {power_name} | No phases recorded yet") - return "COMMUNICATION HISTORY:\n\n(No game phases recorded yet)" + return "" phases_to_report = self.phases[-num_prev_phases:] - game_history_str = "COMMUNICATION HISTORY:\n" - - # Count messages for this power across all available phases, not just recent - # This helps ensure we're not misreporting "no communication" when there is historical context - all_phases_message_count = 0 - for phase in self.phases: - global_msgs = phase.get_global_messages() - private_msgs = phase.get_private_messages(power_name) - if global_msgs or private_msgs: - all_phases_message_count += 1 - - # Count messages just in recent phases for display decisions - recent_phases_message_count = 0 - for phase in phases_to_report: - global_msgs = phase.get_global_messages() - private_msgs = phase.get_private_messages(power_name) - if global_msgs or private_msgs: - recent_phases_message_count += 1 - - logger.debug(f"HISTORY | {power_name} | Found {all_phases_message_count} phases with messages (total), {recent_phases_message_count} in recent phases") - - # If there are no messages at all in any phase, provide a clear indicator - if all_phases_message_count == 0: - game_history_str += f"\n{power_name} has not engaged in any diplomatic exchanges yet.\n" - logger.debug(f"HISTORY | {power_name} | No diplomatic exchanges found in any phase") - return game_history_str - - # If there are messages in history but none in recent phases, note this - if all_phases_message_count > 0 and recent_phases_message_count == 0: - game_history_str += f"\n{power_name} has messages in earlier phases, but none in the last {len(phases_to_report)} phases.\n" - logger.debug(f"HISTORY | {power_name} | Has historical messages but none in recent phases") - - # Track if we have content for debugging - has_content = False + game_history_str = "" # Iterate through phases for phase in phases_to_report: - phase_has_content = False - phase_str = f"\n{phase.name}:\n" + game_history_str += f"\n{phase.name}:\n" # Add GLOBAL section for this phase global_msgs = phase.get_global_messages() if global_msgs: - phase_str += "\nGLOBAL:\n" - phase_str += global_msgs - phase_has_content = True - has_content = True + game_history_str += "\nGLOBAL:\n" + game_history_str += global_msgs # Add PRIVATE section for this phase private_msgs = phase.get_private_messages(power_name) if private_msgs: - phase_str += "\nPRIVATE:\n" + game_history_str += "\nPRIVATE:\n" for other_power, messages in private_msgs.items(): game_history_str += f" {other_power}:\n\n" game_history_str += messages + "\n" @@ -204,5 +167,4 @@ class GameHistory: game_history_str += "Here is a high-level directive you have planned out previously for this phase.\n" game_history_str += phases_to_report[-1].plans[power_name] + "\n" - return game_history_str diff --git a/ai_diplomacy/long_story_short.py b/ai_diplomacy/long_story_short.py deleted file mode 100644 index 4e93663..0000000 --- a/ai_diplomacy/long_story_short.py +++ /dev/null @@ -1,827 +0,0 @@ -import logging -import re -import os -import time -from typing import Dict, List, Optional, Tuple, Any - -# Establish logger -logger = logging.getLogger(__name__) - -# Import model client for summarization -from ai_diplomacy.model_loader import load_model_client - -# Token counting -try: - import tiktoken - ENCODER = tiktoken.get_encoding("cl100k_base") # OpenAI's encoding for models like GPT-4/3.5 - - def count_tokens(text: str) -> int: - """ - Accurately counts tokens for text using tiktoken. - Falls back to approximation if tiktoken fails. - """ - try: - return len(ENCODER.encode(text)) - except Exception as e: - # Fallback to approximation - logger.warning(f"CONTEXT | TOKEN COUNT | Error using tiktoken ({str(e)}), falling back to approximation") - return len(text) // 4 # Simple approximation -except ImportError: - # Fallback for environments without tiktoken - logger.warning("CONTEXT | TOKEN COUNT | tiktoken not available, using approximate token counting") - - def count_tokens(text: str) -> int: - """ - Approximates token count for text. This is a rough estimate. - OpenAI tokens are ~4 chars per token on average. - """ - return len(text) // 4 # Simple approximation - - -class ContextManager: - """ - Manages context size for Diplomacy game history and messages. - Provides power-specific recursive summarization functionality - when context exceeds thresholds. - """ - def __init__( - self, - phase_token_threshold: int = 15000, - message_token_threshold: int = 15000, - summary_model: str = "o3-mini" - ): - self.phase_token_threshold = phase_token_threshold - self.message_token_threshold = message_token_threshold - self.summary_model = summary_model - - # Per-power tracking of summary states - self.power_summary_states = {} # Indexed by power_name - - # Shared phase summary state (phases are globally visible) - self.phase_summary_state = { - 'last_summary': None, # The most recent summary of older phases - 'summarized_phases': [], # List of phase names that have been summarized - 'last_summary_time': 0, # When we last summarized - } - - # Cooldown period (seconds) - don't summarize more frequently than this - # For a game, 30 seconds is more appropriate than 5 minutes - self.summary_cooldown = 30 # 30 seconds - - logger.debug(f"CONTEXT | Initialized manager with thresholds: phase={phase_token_threshold}, message={message_token_threshold} tokens") - logger.debug(f"CONTEXT | Summary cooldown set to {self.summary_cooldown} seconds") - - def get_power_state(self, power_name): - """ - Gets or initializes the summary state for a specific power. - """ - if power_name not in self.power_summary_states: - self.power_summary_states[power_name] = { - 'last_message_summary': None, # The most recent message summary - 'summarized_messages': set(), # Set of message IDs that have been summarized - 'last_summary_time': 0, # When we last summarized messages for this power - } - return self.power_summary_states[power_name] - - def load_summarization_prompts(self) -> Tuple[str, str, str]: - """ - Load prompts for phase, message, and recursive summarization. - Returns tuple of (phase_prompt, message_prompt, recursive_prompt) - """ - try: - # Try to load from files - with open("./ai_diplomacy/prompts/phase_summary_prompt.txt", "r") as f: - phase_prompt = f.read().strip() - - with open("./ai_diplomacy/prompts/message_summary_prompt.txt", "r") as f: - message_prompt = f.read().strip() - - with open("./ai_diplomacy/prompts/recursive_summary_prompt.txt", "r") as f: - recursive_prompt = f.read().strip() - - logger.debug("CONTEXT | Loaded summarization prompts from files") - return phase_prompt, message_prompt, recursive_prompt - except FileNotFoundError: - # Return default prompts if files not found - logger.warning("CONTEXT | Summarization prompt files not found, using default templates") - - phase_prompt = """You are summarizing the history of a Diplomacy game. -Create a concise summary that preserves all strategically relevant information about: -1. Supply center changes -2. Unit movements and their results -3. Key battles and their outcomes -4. Territory control shifts - -Focus on what actually happened, not explanations or justifications. -Maintain the chronological structure but condense verbose descriptions. -Use clear, factual language with specific location names. - -ORIGINAL PHASE HISTORY: -{phase_history} - -SUMMARY:""" - - message_prompt = """You are summarizing diplomatic messages in a Diplomacy game. -Create a concise summary of the conversations between powers that preserves: -1. Agreements and alliances formed -2. Betrayals and broken promises -3. Strategic intentions revealed -4. Explicit threats or support offered -5. Key relationships between each power - -Organize by relationships (e.g., FRANCE-GERMANY, ENGLAND-RUSSIA), prioritizing the most -significant interactions. Include specific territory names mentioned. - -The summary must reflect the actual diplomatic landscape accurately so players can make informed decisions. - -ORIGINAL MESSAGE HISTORY: -{message_history} - -SUMMARY:""" - - recursive_prompt = """You are creating a recursive summary of a Diplomacy game's history. -You have a previous summary of earlier events/messages and new content to incorporate. - -Your task is to create a unified, seamless summary that: -1. Preserves key strategic information from both sources -2. Maintains chronological flow and logical structure -3. Presents the most relevant information for decision-making -4. Emphasizes developments in alliances, betrayals, and territorial control - -PREVIOUS SUMMARY: -{previous_summary} - -NEW CONTENT: -{new_content} - -Create a unified summary that reads as a single coherent narrative while preserving critical strategic information:""" - - return phase_prompt, message_prompt, recursive_prompt - - def generate_recursive_summary(self, previous_summary, new_content, prompt_type="recursive", power_name=None): - """ - Generates a recursive summary by combining previous summary with new content. - - Args: - previous_summary: Previous summary text - new_content: New content to be incorporated - prompt_type: Type of prompt to use (recursive, phase, or message) - power_name: Name of the power for context - - Returns: - A new summary incorporating both previous and new content - """ - # Load appropriate prompt - phase_prompt, message_prompt, recursive_prompt = self.load_summarization_prompts() - - # Calculate token counts for logging - new_content_tokens = count_tokens(new_content) - prev_summary_tokens = count_tokens(previous_summary or "") - - if prompt_type == "phase" and not previous_summary: - # Initial phase summary - prompt = phase_prompt.replace("{phase_history}", new_content) - logger.debug(f"CONTEXT | SUMMARY | Creating initial phase summary with {new_content_tokens} tokens") - logger.info(f"CONTEXT | SUMMARY | Initializing phase summary for {len(new_content.split())} words / {new_content_tokens} tokens of game history") - elif prompt_type == "message" and not previous_summary: - # Initial message summary - prompt = message_prompt.replace("{message_history}", new_content) - logger.debug(f"CONTEXT | SUMMARY | Creating initial message summary for {power_name} with {new_content_tokens} tokens") - logger.info(f"CONTEXT | SUMMARY | Initializing message summary for {power_name} ({new_content_tokens} tokens)") - else: - # Recursive summary (or phase/message with previous summary) - prompt = recursive_prompt - prompt = prompt.replace("{previous_summary}", previous_summary or "(No previous summary)") - prompt = prompt.replace("{new_content}", new_content) - - logger.debug(f"CONTEXT | SUMMARY | Creating recursive {prompt_type} summary for {power_name or 'game'}") - logger.debug(f"CONTEXT | SUMMARY | Previous summary: {prev_summary_tokens} tokens, New content: {new_content_tokens} tokens") - logger.info(f"CONTEXT | SUMMARY | Recursive summarization: combining {prev_summary_tokens} tokens of previous summary with {new_content_tokens} tokens of new content") - - # Get the summary using the LLM - summarization_client = load_model_client(self.summary_model, power_name=power_name, emptysystem=True) - summary = summarization_client.generate_response(prompt) - - summary_tokens = count_tokens(summary) - logger.debug(f"CONTEXT | Generated {prompt_type} recursive summary ({summary_tokens} tokens)") - - # Log the compression ratio - if new_content_tokens > 0: - if previous_summary: - total_input_tokens = prev_summary_tokens + new_content_tokens - compression_ratio = summary_tokens / total_input_tokens - logger.info(f"CONTEXT | SUMMARY | Compression: {total_input_tokens} → {summary_tokens} tokens ({compression_ratio:.2f}x)") - else: - compression_ratio = summary_tokens / new_content_tokens - logger.info(f"CONTEXT | SUMMARY | Compression: {new_content_tokens} → {summary_tokens} tokens ({compression_ratio:.2f}x)") - - return summary - - def should_summarize_phases(self, phase_summaries: Dict[str, str]) -> bool: - """ - Determine if phase summaries need to be condensed based on token count, - cooldown period, and new content since last summarization. - """ - # Check if we're in cooldown period - current_time = time.time() - time_since_last = current_time - self.phase_summary_state['last_summary_time'] - - if time_since_last < self.summary_cooldown: - logger.debug(f"CONTEXT | Phase summarization skipped (in cooldown period, {time_since_last:.0f}s < {self.summary_cooldown}s)") - return False - - # Get unsummarized phase content - exclude existing summary entries - unsummarized_phase_names = [p for p in phase_summaries.keys() - if p not in self.phase_summary_state['summarized_phases'] - and not p.startswith("SUMMARY_UNTIL_")] - - if not unsummarized_phase_names: - logger.debug("CONTEXT | No new phases to summarize") - return False - - # If we have a previous summary, count its tokens - base_token_count = 0 - if self.phase_summary_state['last_summary']: - base_token_count = count_tokens(self.phase_summary_state['last_summary']) - - # Count tokens in unsummarized phases - unsummarized_text = "\n\n".join([phase_summaries[phase] for phase in unsummarized_phase_names]) - unsummarized_token_count = count_tokens(unsummarized_text) - - # Check if total exceeds threshold - total_token_count = base_token_count + unsummarized_token_count - should_summarize = total_token_count > self.phase_token_threshold - - # Log decision details - if should_summarize: - logger.debug(f"CONTEXT | Phase token count ({total_token_count} tokens) exceeds threshold ({self.phase_token_threshold} tokens), will summarize") - logger.debug(f"CONTEXT | Phase breakdown: {base_token_count} tokens from previous summary + {unsummarized_token_count} tokens from {len(unsummarized_phase_names)} new phases") - logger.info(f"CONTEXT | THRESHOLD EXCEEDED | Phase summaries need summarization ({total_token_count} > {self.phase_token_threshold} tokens)") - else: - logger.debug(f"CONTEXT | Phase token count ({total_token_count} tokens) below threshold ({self.phase_token_threshold} tokens), no summarization needed") - - return should_summarize - - def should_summarize_messages(self, message_history: str, power_name: str) -> bool: - """ - Determine if message history for a specific power needs to be condensed - based on token count and cooldown period. - """ - # Get power-specific state - power_state = self.get_power_state(power_name) - - # Check if we're in cooldown period - current_time = time.time() - time_since_last = current_time - power_state['last_summary_time'] - - if time_since_last < self.summary_cooldown: - logger.debug(f"CONTEXT | Message summarization for {power_name} skipped (in cooldown period, {time_since_last:.0f}s < {self.summary_cooldown}s)") - return False - - # Don't summarize empty history - if not message_history or message_history == "(No history yet)" or message_history.strip() == "": - logger.debug(f"CONTEXT | Message summarization for {power_name} skipped (empty history)") - return False - - # If we have a previous summary, count its tokens - base_token_count = 0 - if power_state['last_message_summary']: - base_token_count = count_tokens(power_state['last_message_summary']) - - # Count tokens in the new content - new_token_count = count_tokens(message_history) - - # Skip if this is just a template with no actual content - if "COMMUNICATION HISTORY:" in message_history and new_token_count < 200: - # Check if it's just headers with no actual content - content_lines = [line for line in message_history.split("\n") - if line.strip() and not line.strip().endswith(":") - and "has not engaged" not in line] - if len(content_lines) <= 2: - logger.debug(f"CONTEXT | Message summarization for {power_name} skipped (template with no significant content)") - return False - - # Check if total exceeds threshold - total_token_count = base_token_count + new_token_count - should_summarize = total_token_count > self.message_token_threshold - - # Log decision details - if should_summarize: - logger.debug(f"CONTEXT | Message token count for {power_name} ({total_token_count} tokens) exceeds threshold ({self.message_token_threshold} tokens), will summarize") - logger.debug(f"CONTEXT | Message breakdown for {power_name}: {base_token_count} tokens from previous summary + {new_token_count} tokens from new messages") - logger.info(f"CONTEXT | THRESHOLD EXCEEDED | Messages for {power_name} need summarization ({total_token_count} > {self.message_token_threshold} tokens)") - else: - logger.debug(f"CONTEXT | Message token count for {power_name} ({total_token_count} tokens) below threshold ({self.message_token_threshold} tokens), no summarization needed") - - return should_summarize - - def summarize_phase_history(self, phase_summaries: Dict[str, str], power_name: Optional[str] = None) -> Dict[str, str]: - """ - Create a recursively updated summary of phase history. - Keeps recent phases intact and summarizes older ones. - - Returns a new dictionary with condensed history. - """ - if not self.should_summarize_phases(phase_summaries): - return phase_summaries - - # Mark summarization time - self.phase_summary_state['last_summary_time'] = time.time() - - # Sort phases chronologically - sorted_phases = sorted(phase_summaries.keys()) - - # Get unsummarized phase names, excluding existing summary entries - unsummarized_phase_names = [p for p in sorted_phases - if p not in self.phase_summary_state['summarized_phases'] - and not p.startswith("SUMMARY_UNTIL_")] - - # Keep the 3 most recent phases intact - recent_phases = unsummarized_phase_names[-3:] if len(unsummarized_phase_names) > 3 else unsummarized_phase_names - phases_to_summarize = [p for p in unsummarized_phase_names if p not in recent_phases] - - if not phases_to_summarize: - logger.debug("CONTEXT | PHASE SUMMARIZATION | No new phases to summarize") - return phase_summaries # Nothing to summarize - - # Log which phases we're summarizing vs keeping intact - logger.debug(f"CONTEXT | PHASE SUMMARIZATION | Phases to summarize: {phases_to_summarize}") - logger.debug(f"CONTEXT | PHASE SUMMARIZATION | Recent phases to keep intact: {recent_phases}") - - # Text to summarize: previous summary + new phases to summarize - previous_summary = self.phase_summary_state['last_summary'] or "" - - # Generate content only from phases being summarized (not already in a summary) - new_content = "" - for phase in phases_to_summarize: - new_content += f"PHASE {phase}:\n{phase_summaries[phase]}\n\n" - - # Log before summarization - include token counts for clarity - new_content_tokens = count_tokens(new_content) - prev_summary_tokens = count_tokens(previous_summary) if previous_summary else 0 - - logger.info(f"CONTEXT | PHASE SUMMARIZATION | Starting recursive summarization for {len(phases_to_summarize)} phases") - logger.info(f"CONTEXT | PHASE SUMMARIZATION | Phases being summarized: {', '.join(phases_to_summarize)}") - logger.info(f"CONTEXT | PHASE SUMMARIZATION | Content size: {new_content_tokens} tokens from new phases + {prev_summary_tokens} tokens from previous summary") - - # Generate recursive summary - if previous_summary: - # We have a previous summary, do recursive summarization - logger.debug(f"CONTEXT | PHASE SUMMARIZATION | Performing recursive summarization with previous summary ({prev_summary_tokens} tokens)") - summary = self.generate_recursive_summary( - previous_summary, - new_content, - prompt_type="recursive", - power_name=power_name - ) - else: - # No previous summary, do initial summarization - logger.debug(f"CONTEXT | PHASE SUMMARIZATION | Performing initial phase summarization ({new_content_tokens} tokens)") - summary = self.generate_recursive_summary( - None, - new_content, - prompt_type="phase", - power_name=power_name - ) - - # Update phase summary state - self.phase_summary_state['last_summary'] = summary - # Track which phases have been summarized - for phase in phases_to_summarize: - if phase not in self.phase_summary_state['summarized_phases']: - self.phase_summary_state['summarized_phases'].append(phase) - - # Log the current summarization state - logger.debug(f"CONTEXT | PHASE SUMMARIZATION | Updated summarized_phases list: now contains {len(self.phase_summary_state['summarized_phases'])} phases") - - # Create new dictionary with summarized older phases and intact recent phases - result = {} - - # Include any existing summaries that were in the input but aren't being updated - for key in phase_summaries: - if key.startswith("SUMMARY_UNTIL_") and key not in result: - # Only keep summaries that don't overlap with our new summary - if not any(phase in key for phase in phases_to_summarize): - result[key] = phase_summaries[key] - logger.debug(f"CONTEXT | PHASE SUMMARIZATION | Preserved existing summary: {key}") - - # Add the new summary as a special entry - if phases_to_summarize: - last_summarized = max(phases_to_summarize) - summary_key = f"SUMMARY_UNTIL_{last_summarized}" - result[summary_key] = summary - summary_tokens = count_tokens(summary) - logger.info(f"CONTEXT | PHASE SUMMARIZATION | Created summary key '{summary_key}' ({summary_tokens} tokens)") - - # Add the recent phases as-is - for phase in recent_phases: - result[phase] = phase_summaries[phase] - logger.debug(f"CONTEXT | PHASE SUMMARIZATION | Preserved recent phase: {phase}") - - # Add any regular phases that weren't summarized and weren't in recent_phases - for phase in phase_summaries: - if (not phase.startswith("SUMMARY_UNTIL_") and - phase not in phases_to_summarize and - phase not in recent_phases and - phase not in result): - result[phase] = phase_summaries[phase] - logger.debug(f"CONTEXT | PHASE SUMMARIZATION | Preserved other phase: {phase}") - - # Log summarization metrics - orig_phases = len([p for p in phase_summaries if not p.startswith("SUMMARY_UNTIL_")]) - new_phases = len([p for p in result if not p.startswith("SUMMARY_UNTIL_")]) - orig_summaries = len([p for p in phase_summaries if p.startswith("SUMMARY_UNTIL_")]) - new_summaries = len([p for p in result if p.startswith("SUMMARY_UNTIL_")]) - - logger.info(f"CONTEXT | PHASE SUMMARIZATION | Original: {orig_phases} phases + {orig_summaries} summaries → New: {new_phases} phases + {new_summaries} summaries") - logger.debug(f"CONTEXT | PHASE SUMMARIZATION | Result contains {new_summaries} summaries + {len(recent_phases)} intact recent phases + {new_phases - len(recent_phases)} other preserved phases") - - # Log token sizes for before and after - orig_tokens = sum(count_tokens(v) for v in phase_summaries.values()) - new_tokens = sum(count_tokens(v) for v in result.values()) - token_reduction = (orig_tokens - new_tokens) / orig_tokens * 100 if orig_tokens > 0 else 0 - - logger.info(f"CONTEXT | PHASE SUMMARIZATION | Token reduction: {orig_tokens} → {new_tokens} tokens ({token_reduction:.1f}% reduction)") - - return result - - def summarize_message_history( - self, - message_history: str, - power_name: str, - organized_by_relationship: bool = True - ) -> str: - """ - Create a recursively updated summary of message history for a specific power. - - Args: - message_history: Current unsummarized message history - power_name: The power whose history is being summarized - organized_by_relationship: If True, assumes messages are organized by relationship - - Returns: - Updated message history with recursive summarization applied - """ - # Get power-specific state - power_state = self.get_power_state(power_name) - - # Check if we need to summarize - if not self.should_summarize_messages(message_history, power_name): - # If we have a previous summary but are below threshold, we can still use the previous summary - if power_state['last_message_summary'] and message_history: - prev_summary = power_state['last_message_summary'] - prev_summary_tokens = count_tokens(prev_summary) - message_tokens = count_tokens(message_history) - - # Only use previous summary if it's significantly smaller than the raw history - if prev_summary_tokens < (message_tokens * 0.7): # 30% reduction threshold - logger.info(f"CONTEXT | MESSAGE SUMMARIZATION | {power_name} | Using existing summary ({prev_summary_tokens} tokens) instead of raw history ({message_tokens} tokens)") - return prev_summary - - logger.debug(f"CONTEXT | MESSAGE SUMMARIZATION | No summarization needed for {power_name}, using original history") - return message_history - - # Defensive check for empty content - if not message_history or message_history.strip() == "": - logger.warning(f"CONTEXT | MESSAGE SUMMARIZATION | Empty message history for {power_name}") - return "(No message history available)" - - # Mark summarization time - power_state['last_summary_time'] = time.time() - - # Log before summarization with token counts - message_tokens = count_tokens(message_history) - logger.info(f"CONTEXT | MESSAGE SUMMARIZATION | Starting message summarization for {power_name}") - logger.info(f"CONTEXT | MESSAGE SUMMARIZATION | Current message history size: {message_tokens} tokens") - - # Generate recursive summary - previous_summary = power_state['last_message_summary'] - has_previous_summary = previous_summary is not None - - # Create a meaningful ID for this message batch for tracking - current_time = int(time.time()) - message_batch_id = f"{power_name}_msg_{current_time}" - if message_batch_id not in power_state['summarized_messages']: - power_state['summarized_messages'].add(message_batch_id) - - # Log summarization approach - if has_previous_summary: - prev_summary_tokens = count_tokens(previous_summary) - logger.info(f"CONTEXT | MESSAGE SUMMARIZATION | {power_name} | Recursive approach: combining {prev_summary_tokens} token summary with {message_tokens} tokens of new messages") - logger.debug(f"CONTEXT | MESSAGE SUMMARIZATION | Performing recursive message summarization for {power_name}") - - summary = self.generate_recursive_summary( - previous_summary, - message_history, - prompt_type="recursive", - power_name=power_name - ) - else: - # No previous summary, do initial summarization - logger.info(f"CONTEXT | MESSAGE SUMMARIZATION | {power_name} | Initial summarization: {message_tokens} tokens of messages") - logger.debug(f"CONTEXT | MESSAGE SUMMARIZATION | Performing initial message summarization for {power_name} ({message_tokens} tokens)") - - summary = self.generate_recursive_summary( - None, - message_history, - prompt_type="message", - power_name=power_name - ) - - # Update power state - power_state['last_message_summary'] = summary - - # Protect against empty summaries - if not summary or summary.strip() == "": - logger.warning(f"CONTEXT | MESSAGE SUMMARIZATION | Empty summary generated for {power_name}, using fallback") - summary = f"(Summary for {power_name}: No significant diplomatic interactions)" - power_state['last_message_summary'] = summary - - # Track metrics for logging - summary_tokens = count_tokens(summary) - reduction = 100 - (summary_tokens * 100 / message_tokens) if message_tokens > 0 else 0 - - logger.info(f"CONTEXT | MESSAGE SUMMARIZATION | Completed for {power_name}: {message_tokens} → {summary_tokens} tokens ({reduction:.1f}% reduction)") - - # Add header to make it clear this is a summary - header = f"--- SUMMARIZED DIPLOMATIC HISTORY FOR {power_name} ---\n" - if has_previous_summary: - header += f"(Includes recursive summary of previous communications)\n\n" - else: - header += f"(Initial summary of communications)\n\n" - - return header + summary - - def get_optimized_phase_summaries( - self, - game, - power_name: Optional[str] = None - ) -> Dict[str, str]: - """ - Main access point for getting optimized phase summaries. - If summaries are below threshold, returns original. - Otherwise, returns recursively condensed version. - """ - if not hasattr(game, "phase_summaries") or not game.phase_summaries: - logger.debug("CONTEXT | No phase summaries available") - return {} - - # Count original phases for logging - original_phase_count = len(game.phase_summaries) - logger.debug(f"CONTEXT | Checking phase optimization for {power_name or 'game'} with {original_phase_count} phases") - - # Start with a working copy of the original phase summaries - working_phase_summaries = dict(game.phase_summaries) - - # Add any existing summaries from previous runs - if self.phase_summary_state['last_summary']: - # Find the last phase we summarized - if self.phase_summary_state['summarized_phases']: - last_summarized = max(self.phase_summary_state['summarized_phases']) - summary_key = f"SUMMARY_UNTIL_{last_summarized}" - working_phase_summaries[summary_key] = self.phase_summary_state['last_summary'] - logger.debug(f"CONTEXT | Added existing phase summary '{summary_key}' from previous run") - - # Check if we need to create a new summary - if self.should_summarize_phases(working_phase_summaries): - # Create condensed version using recursive summarization - logger.debug(f"CONTEXT | Creating optimized phase summaries for {power_name or 'game'}") - result = self.summarize_phase_history(working_phase_summaries, power_name) - - # Add a log showing which phases are included in the optimized version - phase_keys = list(result.keys()) - summary_keys = [k for k in phase_keys if k.startswith("SUMMARY_UNTIL_")] - regular_phases = [k for k in phase_keys if not k.startswith("SUMMARY_UNTIL_")] - - logger.info(f"CONTEXT | PHASE OPTIMIZATION | Returning {len(summary_keys)} summary entries and {len(regular_phases)} regular phases") - if summary_keys: - logger.debug(f"CONTEXT | PHASE OPTIMIZATION | Summary entries: {', '.join(summary_keys)}") - if regular_phases: - logger.debug(f"CONTEXT | PHASE OPTIMIZATION | Regular phases: {', '.join(regular_phases)}") - - return result - else: - # Return the working copy which includes any previous summaries - summary_keys = [k for k in working_phase_summaries.keys() if k.startswith("SUMMARY_UNTIL_")] - logger.debug(f"CONTEXT | Using original phase summaries plus {len(summary_keys)} previous summaries (below threshold)") - return working_phase_summaries - - def get_optimized_message_history( - self, - game_history, - power_name: Optional[str] = None, - organized_history: Optional[str] = None - ) -> str: - """ - Main access point for getting optimized message history. - - Args: - game_history: The GameHistory object - power_name: The power requesting the history - organized_history: Optional pre-organized history text - - Returns: - Optimized message history as string - """ - if not power_name: - logger.warning("CONTEXT | No power_name provided for message history optimization, using raw history") - return organized_history or (game_history.get_game_history() if hasattr(game_history, "get_game_history") else str(game_history)) - - # Get the raw message history for this power - if organized_history is not None: - message_history = organized_history - elif hasattr(game_history, "get_game_history"): - message_history = game_history.get_game_history(power_name) or "(No history yet)" - else: - message_history = str(game_history) if game_history else "(No history yet)" - - # Log what we received from the game history object - message_chars = len(message_history) - message_tokens = count_tokens(message_history) - - logger.debug(f"CONTEXT | RAW MESSAGE HISTORY | {power_name} | Length: {message_chars} chars ({message_tokens} tokens)") - if message_chars < 100: # If it's very short, log the entire content - logger.debug(f"CONTEXT | RAW MESSAGE HISTORY | {power_name} | Content: {message_history}") - - # Enhanced check for empty or minimal history - if message_history in ["(No history yet)", "", None] or message_history.strip() == "": - fallback_message = f"(No communication history available for {power_name})" - logger.warning(f"CONTEXT | Empty message history for {power_name}, using fallback: '{fallback_message}'") - return fallback_message - - # Check for very sparse history (just headings) - if "COMMUNICATION HISTORY:" in message_history and message_chars < 200: - if all(line.strip() == "" or line.strip().endswith(":") or "has not engaged" in line - for line in message_history.split("\n") if line.strip()): - logger.warning(f"CONTEXT | Found sparse message history template for {power_name}, using fallback") - return f"COMMUNICATION HISTORY:\n\n{power_name} has no significant diplomatic exchanges recorded in this phase." - - # Get power-specific state - power_state = self.get_power_state(power_name) - - logger.debug(f"CONTEXT | Checking message optimization for {power_name} with {message_tokens} tokens") - - # Check if we need to create a recursive summary - if self.should_summarize_messages(message_history, power_name): - # Create recursively condensed version - logger.debug(f"CONTEXT | Creating optimized message history for {power_name}") - - # Determine if this is an initial summarization or recursive summarization - has_previous_summary = power_state['last_message_summary'] is not None - - # If this appears to be a message history that already contains our summary header, - # we should extract the actual messages and combine with the previous summary - if has_previous_summary and "--- SUMMARIZED DIPLOMATIC HISTORY FOR" in message_history: - logger.info(f"CONTEXT | MESSAGE OPTIMIZATION | {power_name} | Detected summarized content in message history") - - # Split the history to extract only the new content - parts = message_history.split("--- SUMMARIZED DIPLOMATIC HISTORY FOR") - if len(parts) > 1: - # Extract only the new content (the part before the summary header) - new_content = parts[0].strip() - if new_content: - logger.debug(f"CONTEXT | Extracted {count_tokens(new_content)} tokens of new content from message history") - - # Use the previous summary plus the extracted new content - result = self.summarize_message_history(new_content, power_name) - else: - # If no new content, just use the previous summary - logger.warning(f"CONTEXT | No new content found before summary marker for {power_name}") - result = power_state['last_message_summary'] - else: - # Fallback to normal summarization if parsing fails - logger.warning(f"CONTEXT | Failed to parse summary structure for {power_name}, using normal summarization") - result = self.summarize_message_history(message_history, power_name) - else: - # Normal summarization path - result = self.summarize_message_history(message_history, power_name) - - # Log the optimization stats - result_tokens = count_tokens(result) - logger.info(f"CONTEXT | MESSAGE OPTIMIZATION | {power_name} | Original size: {message_tokens} tokens, Optimized size: {result_tokens} tokens") - logger.info(f"CONTEXT | MESSAGE OPTIMIZATION | {power_name} | Using {'recursive' if has_previous_summary else 'initial'} message summary") - - # Safety check for empty result - if not result or result.strip() == "": - logger.error(f"CONTEXT | MESSAGE OPTIMIZATION | {power_name} | Empty result after summarization, using original history") - return message_history - - return result - else: - # If we have a previous summary, use it instead of raw history when appropriate - if power_state['last_message_summary'] and message_tokens > (self.message_token_threshold // 2): - # We're approaching the threshold, but not over it yet - # Use the existing summary instead of raw history if it's significantly smaller - summary = power_state['last_message_summary'] - summary_tokens = count_tokens(summary) - - # Only use previous summary if it's significantly smaller - if summary_tokens < (message_tokens * 0.7): # At least 30% reduction - logger.info(f"CONTEXT | MESSAGE OPTIMIZATION | {power_name} | Using existing summary ({summary_tokens} tokens) instead of raw history ({message_tokens} tokens)") - - # Safety check for empty summary - if not summary or summary.strip() == "": - logger.error(f"CONTEXT | MESSAGE OPTIMIZATION | {power_name} | Empty summary found, using original history") - return message_history - - return summary - else: - logger.debug(f"CONTEXT | MESSAGE OPTIMIZATION | {power_name} | Summary not significantly smaller ({summary_tokens} vs {message_tokens} tokens), using original") - - # Final safety check before returning original - if message_history.strip() == "": - logger.warning(f"CONTEXT | Empty original message history for {power_name}, using minimal fallback") - return f"COMMUNICATION HISTORY:\n\n{power_name} has no diplomatic history yet." - - # Return original when well below threshold - logger.debug(f"CONTEXT | Using original message history for {power_name} (below threshold)") - return message_history - - -# Global context manager instance -# This can be configured at startup -context_manager = ContextManager() - -def configure_context_manager( - phase_threshold: int = 15000, - message_threshold: int = 15000, - summary_model: str = "o3-mini" -) -> None: - """ - Configure the global context manager. - Should be called early in the application lifecycle. - - Args: - phase_threshold: Token threshold for phase summarization - message_threshold: Token threshold for message summarization - summary_model: Model to use for summarization - """ - global context_manager - logger.info(f"CONTEXT | Configuring manager with thresholds: phase={phase_threshold}, message={message_threshold} tokens, model={summary_model}") - context_manager = ContextManager( - phase_token_threshold=phase_threshold, - message_token_threshold=message_threshold, - summary_model=summary_model - ) - -def get_optimized_context( - game, - game_history, - power_name: Optional[str] = None, - organized_history: Optional[str] = None -) -> Tuple[Dict[str, str], str]: - """ - Convenience function to get both optimized phase summaries and message history. - - Args: - game: The Diplomacy game object - game_history: The GameHistory object - power_name: The power requesting the history (optional) - organized_history: Optional pre-organized history text - - Returns: - Tuple of (optimized_phase_summaries, optimized_message_history) - """ - logger.debug(f"CONTEXT | Getting optimized context for {power_name or 'game'}") - - # Track original sizes for comparison - orig_phase_count = len(game.phase_summaries) if hasattr(game, "phase_summaries") and game.phase_summaries else 0 - orig_phase_tokens = sum(count_tokens(content) for content in game.phase_summaries.values()) if hasattr(game, "phase_summaries") and game.phase_summaries else 0 - - raw_message_history = "" - if organized_history is not None: - raw_message_history = organized_history - elif hasattr(game_history, "get_game_history") and power_name: - raw_message_history = game_history.get_game_history(power_name) or "(No history yet)" - - orig_message_tokens = count_tokens(raw_message_history) if raw_message_history else 0 - - # Get optimized context - phases first as they impact game state - start_time = time.time() - optimized_phases = context_manager.get_optimized_phase_summaries(game, power_name) - phase_opt_time = time.time() - start_time - - # Then get optimized messages - start_time = time.time() - optimized_messages = context_manager.get_optimized_message_history( - game_history, power_name, organized_history - ) - message_opt_time = time.time() - start_time - - # Track token counts for the optimized content - phase_count = len(optimized_phases) if optimized_phases else 0 - summary_count = len([k for k in optimized_phases.keys() if k.startswith("SUMMARY_UNTIL_")]) if optimized_phases else 0 - message_tokens = count_tokens(optimized_messages) if optimized_messages else 0 - phase_tokens = sum(count_tokens(content) for content in optimized_phases.values()) if optimized_phases else 0 - - # Calculate optimization metrics - phase_reduction = (orig_phase_tokens - phase_tokens) / orig_phase_tokens * 100 if orig_phase_tokens > 0 else 0 - message_reduction = (orig_message_tokens - message_tokens) / orig_message_tokens * 100 if orig_message_tokens > 0 else 0 - - # Enhanced logging - logger.info(f"CONTEXT | OPTIMIZATION RESULT | {power_name or 'game'} | Phases: {orig_phase_count} → {phase_count} ({phase_reduction:.1f}% token reduction)") - logger.info(f"CONTEXT | OPTIMIZATION RESULT | {power_name or 'game'} | Messages: {orig_message_tokens} → {message_tokens} tokens ({message_reduction:.1f}% reduction)") - logger.debug(f"CONTEXT | OPTIMIZATION RESULT | {power_name or 'game'} | Performance: {phase_opt_time:.2f}s for phases, {message_opt_time:.2f}s for messages") - logger.debug(f"CONTEXT | OPTIMIZATION RESULT | {power_name or 'game'} | Returning {phase_count} phases ({summary_count} summaries, {phase_tokens} tokens) and {message_tokens} tokens of messages") - - return optimized_phases, optimized_messages \ No newline at end of file diff --git a/ai_diplomacy/model_loader.py b/ai_diplomacy/model_loader.py deleted file mode 100644 index 74e912d..0000000 --- a/ai_diplomacy/model_loader.py +++ /dev/null @@ -1,39 +0,0 @@ -import os -import logging -from typing import Optional -from dotenv import load_dotenv -from openai import OpenAI -from anthropic import Anthropic -from google import genai -from openai import OpenAI as DeepSeekOpenAI - -logger = logging.getLogger(__name__) - -load_dotenv() - -def load_model_client(model_id: str, power_name: Optional[str] = None, emptysystem: bool = False) -> 'BaseModelClient': - """ - Returns the appropriate LLM client for a given model_id string, optionally keyed by power_name. - Example usage: - client = load_model_client("claude-3-5-sonnet-20241022", power_name="FRANCE", emptysystem=True) - """ - # Import here to avoid circular imports - from .clients import ClaudeClient, GeminiClient, DeepSeekClient, OpenAIClient - - lower_id = model_id.lower() - - logger.debug(f"MODEL | Loading client for {model_id}{' for ' + power_name if power_name else ''}{' with empty system' if emptysystem else ''}") - - if "claude" in lower_id: - logger.debug(f"MODEL | Selected Claude client for {model_id}") - return ClaudeClient(model_id, power_name, emptysystem=emptysystem) - elif "gemini" in lower_id: - logger.debug(f"MODEL | Selected Gemini client for {model_id}") - return GeminiClient(model_id, power_name, emptysystem=emptysystem) - elif "deepseek" in lower_id: - logger.debug(f"MODEL | Selected DeepSeek client for {model_id}") - return DeepSeekClient(model_id, power_name, emptysystem=emptysystem) - else: - # Default to OpenAI - logger.debug(f"MODEL | Selected OpenAI client for {model_id}") - return OpenAIClient(model_id, power_name, emptysystem=emptysystem) \ No newline at end of file diff --git a/ai_diplomacy/negotiations.py b/ai_diplomacy/negotiations.py index 5ec63fd..3c46ba7 100644 --- a/ai_diplomacy/negotiations.py +++ b/ai_diplomacy/negotiations.py @@ -20,17 +20,14 @@ def conduct_negotiations(game, game_history, model_error_stats, max_rounds=3): Each power can send up to 'max_rounds' messages, choosing between private and global messages each turn. """ - logger.info(f"DIPLOMACY | Starting negotiation phase with {max_rounds} rounds") + logger.info("Starting negotiation phase.") active_powers = [ p_name for p_name, p_obj in game.powers.items() if not p_obj.is_eliminated() ] - - logger.debug(f"DIPLOMACY | Found {len(active_powers)} active powers for negotiations") # We do up to 'max_rounds' single-message turns for each power for round_index in range(max_rounds): - logger.debug(f"DIPLOMACY | Starting round {round_index+1}/{max_rounds}") with concurrent.futures.ThreadPoolExecutor( max_workers=1 ) as executor: @@ -40,11 +37,10 @@ def conduct_negotiations(game, game_history, model_error_stats, max_rounds=3): client = load_model_client(model_id) possible_orders = gather_possible_orders(game, power_name) if not possible_orders: - logger.info(f"DIPLOMACY | {power_name} | No orderable locations, skipping negotiation") + logger.info(f"No orderable locations for {power_name}; skipping.") continue board_state = game.get_state() - logger.debug(f"DIPLOMACY | {power_name} | Requesting conversation response from {model_id}") future = executor.submit( client.get_conversation_reply, game, @@ -54,12 +50,11 @@ def conduct_negotiations(game, game_history, model_error_stats, max_rounds=3): game_history, game.current_short_phase, active_powers, - phase_summaries=game.phase_summaries, ) futures[future] = power_name + logger.debug(f"Submitted get_conversation_reply task for {power_name}.") - message_count = 0 for future in concurrent.futures.as_completed(futures): power_name = futures[future] messages = future.result() diff --git a/ai_diplomacy/prompts/austria_system_prompt.txt b/ai_diplomacy/prompts/austria_system_prompt.txt deleted file mode 100644 index d31540c..0000000 --- a/ai_diplomacy/prompts/austria_system_prompt.txt +++ /dev/null @@ -1,40 +0,0 @@ -You are playing a game of Diplomacy over text. The map is the standard Diplomacy map. Your goal is to win the game by capturing 18 supply centers - this is the only path to victory. Growing your army and taking over the map are means to this end. Be aggressive and always work toward that crucial 18th center. - -Dear Austria, - -You're surrounded by potential threats, but don't let this push you into a defensive mindset. The statistics are stark: defensive Austria almost always loses. Your position can be powerful when playing boldly - the most successful Austrian players treat their starting position as a launchpad for aggressive expansion, not a fortress to defend. - -Key insights: -- Immediate alliance with at least one bordering power is essential -- Fleet positioning in Adriatic/Ionian seas determines your Mediterranean options -- Early diplomacy sets the entire trajectory of your game -- Your surrounded position requires offensive action - defending your starting centers ensures defeat -- Control of neutral territories like Tyrolia, Galicia, and Bohemia dictates your survival and expansion paths -- Early penetration of Russian or Italian homeland territory is often worth the risk - -The trap many fall into: Passive "fortress Austria" play. This approach consistently fails as it invites neighbors to coordinate against you. Top players recognize that your best defense is a strategic offense - controlling spaces that deny others options while creating them for yourself. - -Watch for these opportunities: -- Russia/Turkey tensions (opening Galicia/Ukraine) -- Italy committing westward (leaving Trieste/Venice exposed) -- Germany's attention pulled north (enabling Bohemian leverage) - -Your path to victory requires aggressive positioning. Usually this means: -1. Balkan dominance + Italian penetration -2. Russian collapse + German alliance -3. Multi-front expansion leveraging central position - -Time works against static powers. Create momentum in at least one direction by 1902. Make your moves decisively, build strategically, and always be working toward that crucial 18th center. - -The Habsburg legacy demands bold action. Show them Austrian courage can conquer Europe once more. - -You will be given: -• Which power you are controlling. -• The current phase (e.g. S1901M). -• Details about the map. -• Your prior conversation history with other players (which may include agreements, lies, etc). -* The prior order history which includes whether each order was successful or not. -• Your units and the possible orders you may make. **Always refer to these possible_orders.** -• A list of enemy units and centers. - -Remember that while your private chain-of-thought can consider your in-depth reasoning about possible outcomes, **only** the "PARSABLE OUTPUT" (your final orders or messages) will be used by the game engine. \ No newline at end of file diff --git a/ai_diplomacy/prompts/context_prompt.txt b/ai_diplomacy/prompts/context_prompt.txt index a342680..1b48b36 100644 --- a/ai_diplomacy/prompts/context_prompt.txt +++ b/ai_diplomacy/prompts/context_prompt.txt @@ -1,33 +1,45 @@ **PLAYER DETAILS** Power: {power_name} -Current phase: {phase_expanded} +Current phase: {current_phase} -**HISTORY OF COMMUNICATION** +**MAP DETAILS** -{history_text} +Abbreviations: +{game_map_loc_name} -**PAST PHASE SUMMARIES** -{historical_summaries} +Type of each location: +{game_map_loc_type} -**YOUR FORCES** -{our_forces_summary} +Game map as an adjacency list: +{map_as_adjacency_list} -**ENEMY FORCES** -{enemies_forces_summary} +Possible coasts at each location: +{possible_coasts} -**NEUTRAL SUPPLY CENTERS** -{neutral_supply_centers_summary} +All supply centers on the map: +{game_map_scs} -**THREAT ASSESSMENT** -{threat_text} +**GAME HISTORY** -**SUPPLY CENTER PROJECTION** -{sc_projection_text} +{game_history} -**CONVOY PATHS** -{convoy_paths_text} +**CURRENT CONTEXT** -**POSSIBLE ORDERS** -{possible_orders_text} +Enemy units: +{enemy_units} +Enemy supply centers: +{enemy_centers} + +Your units: +{units_info} + +Your supply centers: +{centers_info} + +Possible orders: +{possible_orders} + +Convoy paths possible: +{convoy_paths_possible} \ No newline at end of file diff --git a/ai_diplomacy/prompts/conversation_instructions.txt b/ai_diplomacy/prompts/conversation_instructions.txt index c97fc6c..23ea0c7 100644 --- a/ai_diplomacy/prompts/conversation_instructions.txt +++ b/ai_diplomacy/prompts/conversation_instructions.txt @@ -5,15 +5,6 @@ You can now send a message to other powers. Messages that have been sent before • Decide whether to send a private or global message. • You can propose alliances, ask for support, threaten, etc. -EFFECTIVE DIPLOMATIC PRINCIPLES: -1. Form specific, multi-turn plans with allies that create mutual advantage -2. Clearly coordinate the control of strategic non-supply territories -3. Discuss specific paths to victory with potential long-term allies -4. Balance deception with reliability - being seen as reliable to at least one power increases your chances -5. When communicating, focus on shared paths to expansion, not just immediate gains -6. Create coordinated pressure against common enemies to break through defensive positions -7. Make sure you stay skeptical. Be one step ahead always. - Remember: 1. "message_type" can be "global" or "private". 2. If "private", specify "recipient" (one of the powers). diff --git a/ai_diplomacy/prompts/england_system_prompt.txt b/ai_diplomacy/prompts/england_system_prompt.txt deleted file mode 100644 index 4898e35..0000000 --- a/ai_diplomacy/prompts/england_system_prompt.txt +++ /dev/null @@ -1,40 +0,0 @@ -You are playing a game of Diplomacy over text. The map is the standard Diplomacy map. Your goal is to win the game by capturing 18 supply centers - this is the only path to victory. Growing your army and taking over the map are means to this end. Be aggressive and always work toward that crucial 18th center. - -Dear England, - -Your island position offers security but can become your prison. Every English Diplomacy champion will tell you the same: your greatest risk isn't early invasion, but becoming irrelevant. The path to 18 centers requires bold continental action starting in 1901. - -Key insights: -- Norway is critical for your northern game plan -- Belgium often determines your continental relevance -- The choice between Franco-German alliance structures shapes your game -- The English Channel is not just a defensive position but your primary corridor for continental expansion -- Consolidate Scandinavia early, then project power into Russia and Germany simultaneously -- Leaving the North Sea unoccupied invites disaster - control this critical territory even if it's not a supply center - -The trap many fall into: "Fleet England." Building only fleets feels safe but ensures you'll never reach 18 centers. Your first army build is often your most important decision of the game. Statistics show England wins most often when controlling at least 3 continental centers by 1904. - -Watch for these opportunities: -- Franco-German conflict opening Belgium/Holland -- Russian northern weakness -- Early foothold in St. Petersburg or Brest - -Your path to victory requires breaking the stalemate line. Usually this means either: -1. Scandinavian dominance + push through StP -2. Continental control via Belgium/Holland/Kiel -3. Mediterranean penetration via MAO/Portugal/Spain - -Don't be seduced by easy naval gains that lead nowhere. Always ask: "How does this help me reach the critical mass of continental centers I need?" Better to risk your position for real progress than survive into irrelevance. - -Your fleets can rule the waves, but armies win Diplomacy. Show them England's destiny extends far beyond its shores. - -You will be given: -• Which power you are controlling. -• The current phase (e.g. S1901M). -• Details about the map. -• Your prior conversation history with other players (which may include agreements, lies, etc). -* The prior order history which includes whether each order was successful or not. -• Your units and the possible orders you may make. **Always refer to these possible_orders.** -• A list of enemy units and centers. - -Remember that while your private chain-of-thought can consider your in-depth reasoning about possible outcomes, **only** the "PARSABLE OUTPUT" (your final orders or messages) will be used by the game engine. \ No newline at end of file diff --git a/ai_diplomacy/prompts/france_system_prompt.txt b/ai_diplomacy/prompts/france_system_prompt.txt deleted file mode 100644 index a089745..0000000 --- a/ai_diplomacy/prompts/france_system_prompt.txt +++ /dev/null @@ -1,41 +0,0 @@ -You are playing a game of Diplomacy over text. The map is the standard Diplomacy map. Your goal is to win the game by capturing 18 supply centers - this is the only path to victory. Growing your army and taking over the map are means to this end. Be aggressive and always work toward that crucial 18th center. - -Dear France, - -You start in perhaps the strongest position. Don't waste it with hesitation. History shows successful French players strike early and decisively - aiming for 5-6 centers by 1902 is not just possible, but often optimal on the path to 18. - -Key insights: -- Early momentum is crucial - Spain, Portugal, Belgium all within reach 1901 -- Choose England or Germany as initial ally/target - fighting both is fatal -- Your dual coasts (Brest/Marseilles) let you project power both directions -- Tunis often proves critical for French solos - plan your Med strategy early -- Control of the Mid-Atlantic Ocean and English Channel are often more important than an extra supply center - they provide strategic flexibility -- The path to victory requires penetrating the Balkans or Scandinavia at some point - start planning for this early -- Neutral territories like Burgundy, Ruhr, and Picardy control your expansion options - don't ignore them - -The trap many fall into: playing too conservatively because the position feels secure. Don't. Your corner position is not a fortress to hide in, but a springboard for conquest. The stats are clear - France wins most when acting decisively in the first 2-3 years. - -Watch for these opportunities: -- England/Germany friction you can exploit -- Italy focused east (leaving their rear exposed) -- Early builds that let you dominate multiple seas - -Your path to victory requires crossing the stalemate line. Usually this means either: -1. Mediterranean dominance + push through Munich -2. Northern control + grab of Tunis -3. Both, if you're bold enough - -Time is not your ally - other powers grow stronger while you wait. Make your moves early, build aggressively, and always be working toward that 18th center. Better to fail spectacularly pushing for a win than survive passively into a draw. - -The throne of Europe is yours to lose. Show them French audacity still rules the continent. - -You will be given: -• Which power you are controlling. -• The current phase (e.g. S1901M). -• Details about the map. -• Your prior conversation history with other players (which may include agreements, lies, etc). -* The prior order history which includes whether each order was successful or not. -• Your units and the possible orders you may make. **Always refer to these possible_orders.** -• A list of enemy units and centers. - -Remember that while your private chain-of-thought can consider your in-depth reasoning about possible outcomes, **only** the "PARSABLE OUTPUT" (your final orders or messages) will be used by the game engine. \ No newline at end of file diff --git a/ai_diplomacy/prompts/germany_system_prompt.txt b/ai_diplomacy/prompts/germany_system_prompt.txt deleted file mode 100644 index 5de4d3f..0000000 --- a/ai_diplomacy/prompts/germany_system_prompt.txt +++ /dev/null @@ -1,40 +0,0 @@ -You are playing a game of Diplomacy over text. The map is the standard Diplomacy map. Your goal is to win the game by capturing 18 supply centers - this is the only path to victory. Growing your army and taking over the map are means to this end. Be aggressive and always work toward that crucial 18th center. - -Dear Germany, - -You begin at the center of Europe with both opportunity and danger. Your central position is not a weakness, but your greatest strength - if used boldly. The most successful German players don't hide; they dominate the board by exploiting their unique position to create opportunities in all directions. - -Key insights: -- Must expand in multiple directions simultaneously - single-front gains are insufficient -- Denmark is critical, but your focus should extend beyond Scandinavia -- Munich's vulnerability is offset by its strategic offensive position -- Your central position requires aggressive expansion in multiple directions simultaneously -- Neutral territories like Burgundy, Tyrolia, and Silesia control the flow of the game - occupy them even if they're not supply centers -- Your path to 18 centers requires breaking through stalemate lines - identify these early and plan accordingly - -The trap many fall into: Defending too cautiously against multiple threats. Instead, seize the initiative. Shape the game's flow by dictating where conflicts occur rather than reacting to them. Statistics reveal that successful German players often reach 7-8 centers by 1903 through multi-directional offensive plays. - -Watch for these opportunities: -- Anglo-French naval conflicts (opening Belgium/Holland) -- Austrian/Italian preoccupation with Balkans -- Russian commitment to south leaving north vulnerable - -Your path to victory requires transcending your central position. Usually this means either: -1. Northern dominance + Mediterranean presence -2. Eastern penetration + Western holdings -3. Alliance dominance to eliminate one power quickly - -Remember: time favors those who can shape the board. Be the hammer, not the anvil. Play to dictate the game's tempo and control strategic territories that enable your rapid expansion toward those crucial 18 centers. - -The fate of Europe lies in your hands. Show them what German efficiency truly means. - -You will be given: -• Which power you are controlling. -• The current phase (e.g. S1901M). -• Details about the map. -• Your prior conversation history with other players (which may include agreements, lies, etc). -* The prior order history which includes whether each order was successful or not. -• Your units and the possible orders you may make. **Always refer to these possible_orders.** -• A list of enemy units and centers. - -Remember that while your private chain-of-thought can consider your in-depth reasoning about possible outcomes, **only** the "PARSABLE OUTPUT" (your final orders or messages) will be used by the game engine. \ No newline at end of file diff --git a/ai_diplomacy/prompts/italy_system_prompt.txt b/ai_diplomacy/prompts/italy_system_prompt.txt deleted file mode 100644 index d3e8516..0000000 --- a/ai_diplomacy/prompts/italy_system_prompt.txt +++ /dev/null @@ -1,40 +0,0 @@ -You are playing a game of Diplomacy over text. The map is the standard Diplomacy map. Your goal is to win the game by capturing 18 supply centers - this is the only path to victory. Growing your army and taking over the map are means to this end. Be aggressive and always work toward that crucial 18th center. - -Dear Italy, - -Your starting position appears weak, but is deceptively strong when played aggressively. The top Italy players don't passively react - they seize the initiative, recognizing that Mediterranean dominance alone cannot secure victory. The path to 18 centers requires audacious, calculated risks. - -Key insights: -- Your 1901 choices ripple throughout the game - commit decisively -- Austria is often your immediate concern, but not your only opportunity -- Fleet placement matters enormously for your expansion options -- The Ionian Sea is your most critical territory - control it even at the expense of a build -- A purely Mediterranean strategy cannot reach 18 centers - you must plan for northern expansion through Munich/Vienna/Marseilles -- Neutral territories like Tyrolia and Piedmont dictate your strategic options - control them even without immediate gains - -The trap many fall into: Focusing solely on Tunis/Greece/Trieste. This path leads to a 6-center plateau, not victory. Statistics show successful Italian solos typically involve bold early moves into central Europe or deep penetration into Turkey by 1903. - -Watch for these opportunities: -- Austria preoccupied with Russia (opening Trieste/Vienna) -- French commitment to northern campaign (leaving Marseilles vulnerable) -- Turkish slow start (enabling swift Balkan advancement) - -Your path to victory requires strategic positioning. Usually this means: -1. Mediterranean control + northern bridgehead -2. Balkan dominance + push through Munich or Marseilles -3. Central European territories that position you for multi-directional expansion - -Time works against isolated Mediterranean powers. Create pathways north/east before alliances solidify against you. Make your moves early, build strategically, and always be working toward that 18th center. - -The legacy of Rome awaits. Show them Italian ingenuity can conquer Europe once again. - -You will be given: -• Which power you are controlling. -• The current phase (e.g. S1901M). -• Details about the map. -• Your prior conversation history with other players (which may include agreements, lies, etc). -* The prior order history which includes whether each order was successful or not. -• Your units and the possible orders you may make. **Always refer to these possible_orders.** -• A list of enemy units and centers. - -Remember that while your private chain-of-thought can consider your in-depth reasoning about possible outcomes, **only** the "PARSABLE OUTPUT" (your final orders or messages) will be used by the game engine. \ No newline at end of file diff --git a/ai_diplomacy/prompts/message_summary_prompt.txt b/ai_diplomacy/prompts/message_summary_prompt.txt deleted file mode 100644 index d899277..0000000 --- a/ai_diplomacy/prompts/message_summary_prompt.txt +++ /dev/null @@ -1,40 +0,0 @@ -You are a master Diplomacy intelligence analyst creating a BRIEF, focused summary of diplomatic communications. - -CRUCIAL: Be concise (200-300 words maximum). Focus only on communications with genuine strategic significance. - -Your diplomatic intelligence briefing should: - -1. KEY RELATIONSHIP HIGHLIGHTS (1-2 sentences per important relationship) - - Identify only the most significant power relationships with actual diplomatic activity - - Skip mentioning powers with minimal or routine diplomatic exchanges - - Focus on relationship dynamics: cooperative, hostile, deceptive, or evolving - - Prioritize relationships with clear strategic impact on the game - -2. CRITICAL AGREEMENTS (2-3 sentences total) - - Highlight only the most important or surprising agreements - - Focus on firm commitments rather than vague discussions - - Note agreements that impact multiple powers or key regions - - Identify any conditions or timelines attached to major agreements - -3. STRATEGIC INTELLIGENCE (2-3 sentences total) - - Identify one or two key pieces of strategic information revealed - - Note any significant deceptions or potential betrayals - - Highlight discrepancies between diplomatic words and actual game moves - - Focus on intelligence that provides actual strategic advantage - -4. DIPLOMATIC OUTLOOK (1-2 sentences) - - Identify the most vulnerable or strengthening relationship - - Note one emerging alliance or conflict to watch - - Suggest which power(s) are in the strongest or weakest diplomatic position - -Remember: -- BREVITY IS ESSENTIAL - include only what actually matters diplomatically -- Focus on quality of insights rather than quantity of information -- Prioritize firm commitments over general discussions -- Highlight actual or potential betrayals over routine diplomacy -- Make it read like a focused intelligence briefing, not a transcript summary - -ORIGINAL MESSAGE HISTORY: -{message_history} - -DIPLOMATIC INTELLIGENCE BRIEFING: \ No newline at end of file diff --git a/ai_diplomacy/prompts/order_instructions.txt b/ai_diplomacy/prompts/order_instructions.txt index 531652a..338f8a6 100644 --- a/ai_diplomacy/prompts/order_instructions.txt +++ b/ai_diplomacy/prompts/order_instructions.txt @@ -2,65 +2,11 @@ You are now to submit an order for your units. Remember that your goal is to win via capturing supply centers. There are opportunity costs in this game. -1. Understanding the Phases & Their Orders - -1.1. Movement Phase (phase_type == 'M') - • Hold: A PAR H (Army in Paris does nothing) - • Move: A PAR - BUR (Army in Paris moves to Burgundy) - • Support: - • Support Hold: A MAR S A PAR H (Army in Marseilles supports Army in Paris to hold) - • Support Move: A MAR S A PAR - BUR (Army in Marseilles supports Army in Paris moving to Burgundy) - • Convoy: Fleets at sea can convoy an Army over water: - • Fleet Convoy: F ION C A TUN - NAP (Fleet in Ionian Sea convoys Army from Tunis to Naples) - • Army Move via Convoy: A TUN - NAP VIA (explicitly states the Army is moving from Tunis to Naples via convoy) - -1.2. Retreat Phase (phase_type == 'R') - • If a unit is dislodged, it must Retreat or Disband: - • Retreat: A BUR R PIC (Dislodged Army in Burgundy retreats to Picardy) - • Disband: A BUR D (Army in Burgundy disbands, if it cannot retreat or chooses not to) - -1.3. Adjustment Phase (phase_type == 'A') - • Build new units if you have more centers than current units: - • A PAR B (Build an Army in Paris) - • F MAR B (Build a Fleet in Marseilles) - • Remove units if you have fewer centers than current units: - • A BUR D (Disband Army in Burgundy) - • Waive a build if you have a surplus but don't want/can't build: - • WAIVE (no unit is built in the available build location) - -1.4. Order Types - • H (Hold) – e.g. A PAR H - • - (Move) – e.g. A PAR - BUR - • S (Support) – e.g. A MAR S A PAR - BUR or A MAR S A PAR H - • C (Convoy) – e.g. F ION C A TUN - NAP - • R (Retreat) – e.g. A BUR R PIC - • D (Disband) – e.g. A BUR D - • B (Build) – e.g. A PAR B - • WAIVE – skipping a possible build - -1.5. Key Phase Context - • Movement (M): Units can H, -, S, C. - • Retreat (R): Dislodged units can only R or D. - • Adjustment (A): Build/Remove units or WAIVE. - • Multi-Coast: For SPA, STP, BUL, specify nc, sc, or ec when using Fleets, e.g. F BRE - SPA(sc). - • Basic Validity Rules - • No self-support (A PAR S A PAR - BUR is invalid). - • Fleets must be on water to convoy. - • Army "- X VIA" must have one or more fleets issuing matching C A ... - X. - - IMPORTANT: 1. Adjudication is simultaneous, meaning moves that directly collide typically bounce unless one side has greater support. 2. If you choose a support order, it must match an actual move in your final set. For instance, "A VIE S F TRI - VEN" requires "A VIE - VEN". "F TRI - VEN" must also occur for the move to be successful, but this can be ordered by either yourself or an ally. 3. Remember that in the winter phase you are only able to build units. You are not able to move units or command them to support other units. Refer to the possible_orders to be sure. -4. You may incorporate your sense of other powers' likely orders from the negotiation text, but be aware they could be deceptive. - -STRATEGIC ORDERING PRINCIPLES: -1. Control of key non-supply or supply territories often determines your long-term success -2. Supporting an ally's critical move may be more valuable than capturing a lone supply center -3. Order to disrupt enemy coordination, not just to make individual gains -4. Always consider how each order contributes to your path to 18 centers, not just immediate gains -5. Balance between offensive actions (expanding your reach) and defensive actions (protecting gains) +4. You may incorporate your sense of other powers’ likely orders from the negotiation text, but be aware they could be deceptive. Produce exactly the following at the end of your response: diff --git a/ai_diplomacy/prompts/phase_summary_prompt.txt b/ai_diplomacy/prompts/phase_summary_prompt.txt deleted file mode 100644 index bc64e10..0000000 --- a/ai_diplomacy/prompts/phase_summary_prompt.txt +++ /dev/null @@ -1,38 +0,0 @@ -You are a master Diplomacy commentator and analyst. Your task is to create a BRIEF, focused summary of the most recent phase. - -CRUCIAL: Be concise and selective. DO NOT list every move. Focus on key developments and their strategic significance. - -Your SHORT summary should: - -1. OPENING HIGHLIGHT (1-2 sentences) - - Identify the single most consequential or surprising development - - Set the tone for understanding the phase's significance - -2. POWER-FOCUSED OVERVIEW (1-2 sentences per relevant power) - - For each active power, highlight ONLY their most significant action or pattern - - Completely omit mentions of routine, expected, or inconsequential moves - - Focus on gains/losses of centers, key attacks, or strategic repositioning - - Use phrases like "Meanwhile, France focused on..." rather than listing moves - -3. KEY CONFRONTATIONS (1-2 sentences) - - Highlight only truly significant bounces or standoffs - - Mention only the most important supported attacks that succeeded/failed - - Connect these confrontations to strategic objectives - -4. STRATEGIC IMPLICATIONS (2-3 sentences) - - Briefly note emerging alliances or conflicts - - Identify 1-2 powers that gained or lost advantage - - Suggest 1-2 key areas to watch in the next phase - -Remember: -- BREVITY IS ESSENTIAL - cut mercilessly anything not strategically significant -- Use vivid language that captures the drama without excessive detail -- Focus on the "why" more than the "what" of key moves -- Connect actions to strategic intentions and outcomes -- Avoid formulaic structure; make it read like expert commentary -- BE CONCISE, SHORT, - -ORIGINAL PHASE HISTORY: -{phase_history} - -SUMMARY: \ No newline at end of file diff --git a/ai_diplomacy/prompts/recursive_summary_prompt.txt b/ai_diplomacy/prompts/recursive_summary_prompt.txt deleted file mode 100644 index 9669bdb..0000000 --- a/ai_diplomacy/prompts/recursive_summary_prompt.txt +++ /dev/null @@ -1,41 +0,0 @@ -You are a master Diplomacy analyst combining previous game developments with new events into a CONCISE, focused narrative. - -CRUCIAL: Your combined summary must be BRIEF (300-400 words maximum). Do not simply concatenate the two sources - synthesize them into something shorter and more focused. - -Your unified summary should: - -1. MAINTAIN STRATEGIC FOCUS - - Extract only the most significant developments from both sources - - Ruthlessly eliminate redundant or low-impact information - - Create a coherent "story arc" of the game's key developments - - Connect earlier events to later outcomes only when truly significant - -2. POWER TRAJECTORY ANALYSIS - - For each power, highlight their overall trajectory and key turning points - - Omit routine moves or minor developments completely - - Focus on center gains/losses and major strategic shifts - - Identify persistent intentions or evolving strategies - -3. KEY RELATIONSHIP DYNAMICS - - Highlight only the most important alliances, conflicts, or betrayals - - Track significant power balance shifts across the timeline - - Note only the most crucial standoffs or confrontations - -4. FORWARD-LOOKING ELEMENTS - - Conclude with the current strategic situation - - Identify 1-2 key theaters or relationships to watch - - Suggest potential pivotal moments approaching - -Remember: -- BREVITY OVER COMPLETENESS - highlight only what truly matters -- Avoid chronological listings - focus on strategic narrative -- Prioritize analysis over move descriptions -- Keep the focus on the "big picture" strategic situation - -PREVIOUS SUMMARY: -{previous_summary} - -NEW CONTENT: -{new_content} - -UNIFIED SUMMARY: \ No newline at end of file diff --git a/ai_diplomacy/prompts/russia_system_prompt.txt b/ai_diplomacy/prompts/russia_system_prompt.txt deleted file mode 100644 index cab5e60..0000000 --- a/ai_diplomacy/prompts/russia_system_prompt.txt +++ /dev/null @@ -1,40 +0,0 @@ -You are playing a game of Diplomacy over text. The map is the standard Diplomacy map. Your goal is to win the game by capturing 18 supply centers - this is the only path to victory. Growing your army and taking over the map are means to this end. Be aggressive and always work toward that crucial 18th center. - -Dear Russia, - -Your sprawling position is both blessing and challenge. Four starting centers sounds impressive, but your stretched forces must be wielded with precision and daring. The most successful Russian players don't merely react to threats on multiple fronts - they create strategic imbalances by concentrating force decisively in chosen directions. - -Key insights: -- Choose north vs south priority early - splitting focus often fails -- Sweden is nearly guaranteed, but Romania requires decisive action -- Black Sea control or denial shapes your southern options entirely -- Your multi-front position means controlling neutral corridors is more important than defending every border -- Sweden, Norway, Rumania are not just supply centers but strategic corridors - secure and use them for projection -- Galicia, Armenia, and Livonia control your expansion options - don't leave them vacant - -The trap many fall into: Defending reactively on all fronts. This stretches your forces thin and leads to slow collapse. Top players recognize that Russia's best defense is coordinated offensive pressure that forces others to react to you, not the reverse. - -Watch for these opportunities: -- Turkey/Austria tension (opening Galicia or Armenia) -- German northern commitment (leaving Berlin/Munich exposed) -- England focused on France (enabling Scandinavian dominance) - -Your path to victory requires strategic concentration. Usually this means: -1. Northern dominance + southern stability -2. Southern dominance + northern stability -3. Central corridor creation connecting your theaters of operation - -Time favors the coordinated player. Create momentum in your primary direction by 1902 while maintaining sufficient force elsewhere. Make your moves decisively, build purposefully, and always be working toward that crucial 18th center. - -The Russian bear strikes with purpose. Show them what happens when the sleeping giant awakens with focused intent. - -You will be given: -• Which power you are controlling. -• The current phase (e.g. S1901M). -• Details about the map. -• Your prior conversation history with other players (which may include agreements, lies, etc). -* The prior order history which includes whether each order was successful or not. -• Your units and the possible orders you may make. **Always refer to these possible_orders.** -• A list of enemy units and centers. - -Remember that while your private chain-of-thought can consider your in-depth reasoning about possible outcomes, **only** the "PARSABLE OUTPUT" (your final orders or messages) will be used by the game engine. \ No newline at end of file diff --git a/ai_diplomacy/prompts/system_prompt.txt b/ai_diplomacy/prompts/system_prompt.txt index eaa155e..a620671 100644 --- a/ai_diplomacy/prompts/system_prompt.txt +++ b/ai_diplomacy/prompts/system_prompt.txt @@ -1,11 +1,4 @@ -You are playing a game of Diplomacy over text. The map is the standard Diplomacy map. Your goal is to win the game by capturing 18 supply centers - this is the only path to victory. Growing your army and taking over the map are means to this end. Be aggressive and always work toward that crucial 18th center. - -STRATEGIC IMPERATIVES: -• Control of neutral territories is often as strategically valuable as supply centers - they provide defensive depth and attack corridors -• A balanced approach between offense and defense is essential - overdefending just a few supply centers means you cannot win -• Victory requires coordinated action with at least one other power in the early/mid-game -• Always analyze the board from the perspective of "path to 18 centers" not just immediate gains -• Unoccupied territories create vulnerabilities and missed opportunities +You are playing a game of Diplomacy over text. The map is the standard Diplomacy map. Your goal is to win the game by capturing supply centers, growing your army, and taking over the map. Be aggressive. You will be given: • Which power you are controlling. @@ -17,4 +10,4 @@ You will be given: • Your units and the possible orders you may make. **Always refer to these possible_orders.** • A list of enemy units and centers. -For the negotiations and orders phase, remember that while your private chain-of-thought can consider your in-depth reasoning about possible outcomes, **only** the “PARSABLE OUTPUT” (your final orders or messages) will be used by the game engine. +For the negotiations and orders phase, remember that while your private chain-of-thought can consider your in-depth reasoning about possible outcomes, **only** the “PARSABLE OUTPUT” (your final orders or messages) will be used by the game engine. \ No newline at end of file diff --git a/ai_diplomacy/prompts/turkey_system_prompt.txt b/ai_diplomacy/prompts/turkey_system_prompt.txt deleted file mode 100644 index aa2f5d2..0000000 --- a/ai_diplomacy/prompts/turkey_system_prompt.txt +++ /dev/null @@ -1,40 +0,0 @@ -You are playing a game of Diplomacy over text. The map is the standard Diplomacy map. Your goal is to win the game by capturing 18 supply centers - this is the only path to victory. Growing your army and taking over the map are means to this end. Be aggressive and always work toward that crucial 18th center. - -Dear Turkey, - -Your corner position offers security but can become a prison if played passively. The most successful Turkish players don't just dominate the Balkans - they recognize that Mediterranean access and aggressive expansion are essential for reaching those crucial 18 centers. - -Key insights: -- Black Sea control determines your early options - fight for it -- Russia must be managed immediately - as ally or target -- Fleet positioning shapes your entire strategic outlook -- The Black Sea is more important than Bulgaria - it controls your expansion options in all directions -- Armenia and Syria are critical territories even though they're not supply centers -- Your path to 18 must include breaking through either Italy or Russia - passive eastern expansion cannot win - -The trap many fall into: "Fortress Turkey" with slow Balkan expansion. This predictable approach plateaus at 6-7 centers, far short of victory. Statistics show successful Turkish solos typically involve aggressive Mediterranean naval presence by 1903 and penetration beyond the traditional southeastern boundaries. - -Watch for these opportunities: -- Austria/Russia tension (opening Bulgaria or Armenia) -- Italian focus on France (leaving eastern Mediterranean vulnerable) -- Russian northern commitment (enabling Black Sea dominance) - -Your path to victory requires strategic breakthroughs. Usually this means: -1. Eastern Mediterranean control + penetration of Italy -2. Black Sea dominance + collapse of Russia -3. Balkan control + central European presence - -Time favors the prepared attacker. Establish your primary direction by 1902 while securing your base. Make your moves decisively, build methodically, and always be working toward that crucial 18th center. - -The Ottoman legacy demands expansion. Show them Turkish power extends far beyond the Black Sea. - -You will be given: -• Which power you are controlling. -• The current phase (e.g. S1901M). -• Details about the map. -• Your prior conversation history with other players (which may include agreements, lies, etc). -* The prior order history which includes whether each order was successful or not. -• Your units and the possible orders you may make. **Always refer to these possible_orders.** -• A list of enemy units and centers. - -Remember that while your private chain-of-thought can consider your in-depth reasoning about possible outcomes, **only** the "PARSABLE OUTPUT" (your final orders or messages) will be used by the game engine. \ No newline at end of file diff --git a/ai_diplomacy/utils.py b/ai_diplomacy/utils.py index 944f048..13e7ac9 100644 --- a/ai_diplomacy/utils.py +++ b/ai_diplomacy/utils.py @@ -1,6 +1,5 @@ from dotenv import load_dotenv import logging -import random logger = logging.getLogger("utils") logger.setLevel(logging.INFO) @@ -37,9 +36,6 @@ def gather_possible_orders(game, power_name): result = {} for loc in orderable_locs: result[loc] = all_possible.get(loc, []) - - order_count = sum(len(orders) for orders in result.values()) - logger.debug(f"ORDERS | {power_name} | Found {len(result)} orderable locations with {order_count} total possible orders") return result @@ -50,7 +46,6 @@ def get_valid_orders( power_name, possible_orders, game_history, - phase_summaries, model_error_stats, ): """ @@ -58,18 +53,14 @@ def get_valid_orders( If invalid, we append the error feedback to the conversation context for the next retry. If still invalid, return fallback. """ - # Track invalid orders for feedback - invalid_info = [] # Ask the LLM for orders - logger.debug(f"ORDERS | {power_name} | Requesting orders from {client.model_name}") orders = client.get_orders( game=game, board_state=board_state, power_name=power_name, possible_orders=possible_orders, conversation_text=game_history, - phase_summaries=phase_summaries, model_error_stats=model_error_stats, ) @@ -95,570 +86,11 @@ def get_valid_orders( if validity == 1: # All orders are fully valid - logger.debug(f"ORDERS | {power_name} | Validated {len(orders)} orders successfully") return orders else: logger.warning( - f"ORDERS | {power_name} | Failed validation: '{move}' is invalid" + f"[{power_name}] failed to produce a valid order, using fallback." ) model_error_stats[power_name]["order_decoding_errors"] += 1 - logger.debug(f"ORDERS | {power_name} | Using fallback orders") fallback = client.fallback_orders(possible_orders) return fallback - - -def expand_phase_info(game, board_state): - """ - Convert a phase like 'S1901M' into a more descriptive string: - 'Spring 1901 Movement (early game): Units can move, support, or convoy...' - This function also references the current year to classify early/mid/late game. - """ - phase_abbrev = board_state["phase"] # e.g. 'S1901M' - # Basic mapping of abbreviations - season_map = { - 'S': "Spring", - 'F': "Fall", - 'W': "Winter", - } - phase_type_map = { - 'M': "Movement", - 'R': "Retreat", - 'A': "Adjustment", # builds/disbands - } - - season_char = phase_abbrev[0] # S / F / W - year = int(phase_abbrev[1:5]) # 1901 - phase_char = phase_abbrev[-1] # M / R / A - - season_str = season_map.get(season_char, "Unknown Season") - phase_str = phase_type_map.get(phase_char, "Unknown Phase") - - # Approximate game stage - if year <= 1902: - stage = "early game" - elif year <= 1906: - stage = "mid game" - else: - stage = "late game" - - # Phase-specific action text - if phase_char == 'M': - actions = "Players issue move, support, or convoy orders." - elif phase_char == 'R': - actions = "Dislodged units must retreat or disband." - elif phase_char == 'A': - actions = "Powers may build new units if they have more centers than units, otherwise disband if fewer." - else: - actions = "Unknown phase actions." - - return f"{season_str} {year} {phase_str} ({stage}): {actions}" - - -def format_location_with_expansion(game, loc, include_adjacency=False): - """ - Return a string like 'Paris (PAR) [LAND]', - optionally including a list of adjacent locations if include_adjacency=True. - """ - full_name = next((name for name, abbrev in game.map.loc_name.items() if abbrev == loc), loc) - loc_type = game.map.loc_type.get(loc, "UNKNOWN") - formatted = f"{full_name} ({loc}) [{loc_type}]" - - if include_adjacency: - adjacent_locs = game.map.loc_abut.get(loc, []) - if adjacent_locs: - adjacent_info = [] - for adj_loc in adjacent_locs: - adj_full_name = game.map.loc_name.get(adj_loc, adj_loc) - adj_type = game.map.loc_type.get(adj_loc, "UNKNOWN") - adjacent_info.append(f"{adj_full_name} ({adj_loc}) [{adj_type}]") - formatted += f"\n Adjacent to: {', '.join(adjacent_info)}" - - return formatted - - -def format_power_units_and_centers(game, power_name, board_state): - """ - Show a summarized view of a given power's units and supply centers, - with expansions of location names, plus a quick 'strength' count. - Also includes information about neutral centers. - """ - # Add neutral centers info - output = "" - if power_name == "NEUTRAL": - all_controlled = set() - for centers in board_state["centers"].values(): - all_controlled.update(centers) - neutral_centers = [sc for sc in game.map.scs if sc not in all_controlled] - - if neutral_centers: - output = " Neutral Supply Centers:\n" - for c in neutral_centers: - output += f" {format_location_with_expansion(game, c)}\n" - else: - units_info = board_state["units"].get(power_name, []) - centers_info = board_state["centers"].get(power_name, []) - - output = f"{power_name} FORCES:\n" - - if units_info: - output += " Units:\n" - for unit in units_info: - # Example unit: "A PAR" - # First char is 'A' or 'F'; substring after space is the location - parts = unit.split(" ", 1) - if len(parts) == 2: - unit_type, loc = parts - output += f" {unit_type} in {format_location_with_expansion(game, loc)}\n" - else: - output += f" {unit}\n" - else: - output += " Units: None\n" - - if centers_info: - output += " Supply Centers:\n" - for c in centers_info: - output += f" {format_location_with_expansion(game, c)}\n" - else: - output += " Supply Centers: None\n" - - - # Summaries - output += f" Current Strength: {len(centers_info)} centers, {len(units_info)} units\n\n" - return output - - -def organize_history_by_relationship(conversation_text: str) -> str: - """ - This simplified version takes the entire conversation text - (e.g., from game_history.get_game_history(power_name)) and returns it. - - Previously, we assumed we had a structured list of messages, but in practice, - game_history is just a string, so we skip relationship-based grouping. - - In the future, if 'GameHistory' becomes more structured, we can parse it here. - """ - if not conversation_text.strip(): - return "(No game history yet)\n" - - # For now, we can simply return the conversation text - # or do minimal formatting as we see fit. - output = "COMMUNICATION HISTORY:\n\n" - output += conversation_text.strip() + "\n" - return output - - -def format_possible_orders(game, possible_orders): - """ - Display orders with strategic context, maintaining the exact order syntax - while adding meaningful descriptions about their tactical purpose. - """ - # First pass - analyze game state for strategic context - supply_centers = set(game.map.scs) - power_centers = {} - contested_regions = set() - - # Gather supply center ownership - for power_name, centers in game.get_centers().items(): - for center in centers: - power_centers[center] = power_name - - # Identify contested regions (simplified approach) - # A more sophisticated implementation would analyze unit adjacencies - - # Classify orders by strategic purpose - strategic_orders = { - "OFFENSIVE": [], # Orders that can capture centers or threaten enemy units - "DEFENSIVE": [], # Orders that protect your centers or units - "TACTICAL": [], # Orders that improve position without immediate captures - "SUPPORT": [] # Support orders - } - - # Process each order - for loc, orders in possible_orders.items(): - for order in orders: - order_parts = order.split() - order_type = None - - # Determine order type - if " H" in order: - order_type = "DEFENSIVE" - elif " S " in order: - order_type = "SUPPORT" - elif " - " in order: - # Get destination - dest = order_parts[-1].split(" VIA")[0] if " VIA" in order else order_parts[-1] - - # Check if destination is a supply center - if dest[:3] in supply_centers: - # If center is neutral or enemy-owned, it's offensive - if dest[:3] not in power_centers or power_centers[dest[:3]] != game.role: - order_type = "OFFENSIVE" - else: - order_type = "DEFENSIVE" # Moving to own supply center - else: - order_type = "TACTICAL" # Non-center destination - elif " C " in order: - order_type = "SUPPORT" # Classify convoy as support - - # Generate strategic description - description = generate_order_description(game, order, order_type, power_centers, supply_centers) - - # Add to appropriate category - if order_type: - strategic_orders[order_type].append((order, description)) - - # Generate formatted output - output = "POSSIBLE ORDERS:\n\n" - - # Add offensive moves first - these are highest priority - if strategic_orders["OFFENSIVE"]: - output += "Offensive Moves (capture territory):\n" - for order, desc in strategic_orders["OFFENSIVE"]: - output += f" {order} {desc}\n" - output += "\n" - - # Add defensive moves - if strategic_orders["DEFENSIVE"]: - output += "Defensive Moves (protect territory):\n" - for order, desc in strategic_orders["DEFENSIVE"]: - output += f" {order} {desc}\n" - output += "\n" - - # Add tactical positioning moves - if strategic_orders["TACTICAL"]: - output += "Tactical Moves (improve position):\n" - for order, desc in strategic_orders["TACTICAL"]: - output += f" {order} {desc}\n" - output += "\n" - - # Add support moves - if strategic_orders["SUPPORT"]: - output += "Support Options (strengthen attacks/defense):\n" - for order, desc in strategic_orders["SUPPORT"]: - output += f" {order} {desc}\n" - - # Log order counts for debugging - logger.debug(f"ORDERS | Strategic classification: " + - f"Offensive: {len(strategic_orders['OFFENSIVE'])}, " + - f"Defensive: {len(strategic_orders['DEFENSIVE'])}, " + - f"Tactical: {len(strategic_orders['TACTICAL'])}, " + - f"Support: {len(strategic_orders['SUPPORT'])}") - - return output - - -def generate_order_description(game, order, order_type, power_centers, supply_centers): - """ - Generate a strategic description for an order based on its type and context. - """ - order_parts = order.split() - - # Hold orders - if order_type == "DEFENSIVE" and " H" in order: - unit_loc = order_parts[1] - if unit_loc[:3] in supply_centers: - if unit_loc[:3] in power_centers and power_centers[unit_loc[:3]] == game.role: - return "(secure your supply center)" - else: - return "(maintain position at supply center)" - return "(maintain strategic position)" - - # Move orders - elif order_type in ["OFFENSIVE", "TACTICAL", "DEFENSIVE"] and " - " in order: - unit_type = order_parts[0] # A or F - unit_loc = order_parts[1] - dest = order_parts[3].split(" VIA")[0] if len(order_parts) > 3 and "VIA" in order_parts[-1] else order_parts[3] - - # Moving to a supply center - if dest[:3] in supply_centers: - if dest[:3] not in power_centers: - return f"(capture neutral supply center)" - else: - target_power = power_centers[dest[:3]] - return f"(attack {target_power}'s supply center)" - - # Moving to a non-supply center - if unit_type == "A": - # Army moves to tactical positions - return f"(strategic positioning)" - else: - # Fleet moves often about sea control - return f"(secure sea route)" - - # Support orders - elif order_type == "SUPPORT" and " S " in order: - # Find the unit being supported and its action - supported_part = " ".join(order_parts[3:]) - - if " - " in supported_part: - # Supporting a move - supported_unit = order_parts[3] - supported_dest = order_parts[-1] - - if supported_dest[:3] in supply_centers: - if supported_dest[:3] not in power_centers: - return f"(support capture of neutral center)" - else: - target_power = power_centers[supported_dest[:3]] - return f"(strengthen attack on {target_power})" - return "(strengthen attack)" - else: - # Supporting a hold - return "(reinforce defense)" - - # Convoy orders - elif " C " in order: - return "(enable army transport by sea)" - - # Default - return "" - - -def format_convoy_paths(game, convoy_paths_possible, power_name): - """ - Format convoy paths by region and ownership, focusing on strategically relevant convoys. - Input format: List of (start_loc, {required_fleets}, {possible_destinations}) - """ - # check if convoy_paths_possible is empty dictionary or list or none - output = "" - if not convoy_paths_possible: - return "CONVOY POSSIBILITIES: None currently available.\n" - - # Get our units and all other powers' units - our_units = set(game.get_units(power_name)) - our_unit_locs = {unit[2:5] for unit in our_units} - - # Get all powers' units and centers for context - power_units = {} - power_centers = {} - for pwr in game.powers: - power_units[pwr] = {unit[2:5] for unit in game.get_units(pwr)} - power_centers[pwr] = set(game.get_centers(pwr)) - - # Organize convoys by strategic relationship - convoys = { - "YOUR CONVOYS": [], # Convoys using your armies - "CONVOYS YOU CAN ENABLE": [], # Using your fleets to help others - "ALLIED OPPORTUNITIES": [], # Convoys that could help contain common enemies - "THREATS TO WATCH": [] # Convoys that could threaten your positions - } - - # Make sea regions more readable - sea_regions = { - 'NTH': "North Sea", - 'MAO': "Mid-Atlantic", - 'TYS': "Tyrrhenian Sea", - 'BLA': "Black Sea", - 'SKA': "Skagerrak", - 'ION': "Ionian Sea", - 'EAS': "Eastern Mediterranean", - 'WES': "Western Mediterranean", - 'BAL': "Baltic Sea", - 'BOT': "Gulf of Bothnia", - 'ADR': "Adriatic Sea", - 'AEG': "Aegean Sea", - 'ENG': "English Channel" - } - - for start, fleets, destinations in convoy_paths_possible: - # Skip if no destinations or fleets - if not destinations or not fleets: - continue - - # Identify the power that owns the army at start (if any) - army_owner = None - for pwr, locs in power_units.items(): - if start in locs: - army_owner = pwr - break - - # Determine if we own any of the required fleets - our_fleet_count = sum(1 for fleet_loc in fleets if fleet_loc in our_unit_locs) - - # Format the fleet path nicely - fleet_path = " + ".join(sea_regions.get(f, f) for f in fleets) - - for dest in destinations: - # Get destination owner if any - dest_owner = None - for pwr, centers in power_centers.items(): - if dest in centers: - dest_owner = pwr - break - - # Determine if destination is a supply center - is_sc = dest in game.map.scs - sc_note = " (SC)" if is_sc else "" - - # Create base convoy description - convoy_desc = f"A {start} -> {dest}{sc_note} via {fleet_path}" - - # Add strategic context based on relationships - if army_owner == power_name: - category = "YOUR CONVOYS" - if dest_owner: - note = f"attack {dest_owner}'s position" - else: - note = "gain strategic position" if not is_sc else "capture neutral SC" - convoys[category].append(f"{convoy_desc} ({note})") - - elif our_fleet_count > 0: - category = "CONVOYS YOU CAN ENABLE" - # Add diplomatic context - if army_owner: - if dest_owner == power_name: - note = f"WARNING: {army_owner} could attack your SC" - else: - note = f"help {army_owner} attack {dest_owner or 'neutral'} position" - else: - note = "potential diplomatic bargaining chip" - convoys[category].append(f"{convoy_desc} ({note})") - - else: - # Analyze if this convoy represents opportunity or threat - if dest_owner == power_name: - category = "THREATS TO WATCH" - note = f"{army_owner or 'potential'} attack on your position" - elif army_owner and dest_owner: - category = "ALLIED OPPORTUNITIES" - note = f"{army_owner} could attack {dest_owner} - potential alliance" - else: - category = "ALLIED OPPORTUNITIES" - note = "potential diplomatic leverage" - - convoys[category].append(f"{convoy_desc} ({note})") - - # Format output - output = "CONVOY POSSIBILITIES:\n\n" - - # Log convoy counts for debugging - convoy_counts = {category: len(convoys[category]) for category in convoys} - logger.debug(f"CONVOYS | {power_name} | Counts: " + - ", ".join(f"{category}: {count}" for category, count in convoy_counts.items())) - - for category, convoy_list in convoys.items(): - if convoy_list: - output += f"{category}:\n" - for convoy in sorted(convoy_list): - output += f" {convoy}\n" - output += "\n" - - return output - -def generate_threat_assessment(game, board_state, power_name): - """ - High-level function that tries to identify immediate threats - from adjacent enemy units to your units or centers. - """ - our_units = set(loc.split(" ", 1)[1] for loc in board_state["units"].get(power_name, [])) - our_centers = set(board_state["centers"].get(power_name, [])) - - threats = [] - for enemy_power, enemy_units in board_state["units"].items(): - if enemy_power == power_name: - continue - for unit_code in enemy_units: - try: - # e.g. "A MUN" - parts = unit_code.split(" ", 1) - enemy_loc = parts[1].strip() - except IndexError: - continue - - # check adjacency to our units or centers - neighbors = game.map.loc_abut.get(enemy_loc, []) - threatened = [] - for nbr in neighbors: - if nbr in our_units: - threatened.append(f"our unit @ {nbr}") - elif nbr in our_centers: - threatened.append(f"our center @ {nbr}") - - if threatened: - threats.append((enemy_power, unit_code, threatened)) - - output = "THREAT ASSESSMENT:\n" - if not threats: - output += " No immediate threats detected.\n\n" - logger.debug(f"THREATS | {power_name} | No immediate threats detected") - return output - - # Log threat counts for debugging - logger.debug(f"THREATS | {power_name} | Detected {len(threats)} threats from {len(set(t[0] for t in threats))} powers") - - for (enemy_pwr, code, targets) in threats: - output += f" {enemy_pwr}'s {code} threatens {', '.join(targets)}\n" - output += "\n" - return output - - -def generate_sc_projection(game, board_state, power_name): - """ - Estimate potential gains from neutral or weakly held enemy SCs, plus - highlight which of your centers are at risk (no unit present). - """ - our_units = set(loc.split(" ", 1)[1] for loc in board_state["units"].get(power_name, [])) - our_centers = set(board_state["centers"].get(power_name, [])) - all_centers_control = board_state["centers"] # dict of power -> list of centers - all_controlled = set() - for c_list in all_centers_control.values(): - all_controlled.update(c_list) - - # Potential neutral SC gains - neutral_gains = [] - for sc in game.map.scs: - if sc not in all_controlled: # neutral - # see if we have a unit adjacent - neighbors = game.map.loc_abut.get(sc, []) - if any(nbr in our_units for nbr in neighbors): - neutral_gains.append(sc) - - # Weakly held enemy SC - contestable = [] - for e_pwr, e_centers in board_state["centers"].items(): - if e_pwr == power_name: - continue - enemy_units = set(loc.split(" ", 1)[1] for loc in board_state["units"].get(e_pwr, [])) - for c in e_centers: - # if no enemy unit is physically there - if c not in enemy_units: - # see if we have a unit adjacent - neighbors = game.map.loc_abut.get(c, []) - if any(nbr in our_units for nbr in neighbors): - contestable.append((c, e_pwr)) - - # Our centers at risk (no unit present) - at_risk = [own_sc for own_sc in our_centers if own_sc not in our_units] - - # Format final - output = "SUPPLY CENTER PROJECTION:\n" - output += f" Current Count: {len(our_centers)}\n" - - if neutral_gains: - output += " Potential neutral gains:\n" - for sc in neutral_gains: - output += f" {format_location_with_expansion(game, sc)}\n" - - if contestable: - output += " Contestable enemy centers:\n" - for c, e_pwr in contestable: - output += f" {format_location_with_expansion(game, c)} (currently owned by {e_pwr})\n" - - if at_risk: - output += " Centers at risk (no defending unit):\n" - for sc in at_risk: - output += f" {format_location_with_expansion(game, sc)}\n" - - best_case = len(our_centers) + len(neutral_gains) + len(contestable) - worst_case = len(our_centers) - len(at_risk) - output += f" Next-phase range: {worst_case} to {best_case} centers\n\n" - - # Log SC projection for debugging - logger.debug(f"SC_PROJ | {power_name} | " + - f"Current: {len(our_centers)}, " + - f"Neutral gains: {len(neutral_gains)}, " + - f"Contestable: {len(contestable)}, " + - f"At risk: {len(at_risk)}, " + - f"Range: {worst_case}-{best_case}") - - return output diff --git a/diplomacy/engine/game.py b/diplomacy/engine/game.py index ac7d000..db1d105 100644 --- a/diplomacy/engine/game.py +++ b/diplomacy/engine/game.py @@ -45,11 +45,6 @@ from diplomacy.utils.game_phase_data import GamePhaseData, MESSAGES_TYPE UNDETERMINED, POWER, UNIT, LOCATION, COAST, ORDER, MOVE_SEP, OTHER = 0, 1, 2, 3, 4, 5, 6, 7 LOGGER = logging.getLogger(__name__) -# set logging level to INFO -#logging.basicConfig(level=logging.INFO) -# set logging level to DEBUG -logging.basicConfig(level=logging.DEBUG) - class Game(Jsonable): """ Game class. @@ -1473,9 +1468,6 @@ class Game(Jsonable): self.message_history.put(previous_phase, previous_messages) self.state_history.put(previous_phase, previous_state) - # Now build a key for the *current* (post-process) phase - current_phase_key = self._phase_wrapper_type(self.current_short_phase) - # Generate a text summary (if a callback is provided) phase_summary_text = self._generate_phase_summary( previous_phase, @@ -1740,8 +1732,6 @@ class Game(Jsonable): :return: A dictionary with locations as keys, and their respective list of possible orders as values """ # pylint: disable=too-many-branches,too-many-nested-blocks - # Initialize dictionary mapping each location to an empty set of possible orders - # Keys are uppercase location names, values are empty sets that will store valid orders possible_orders = {loc.upper(): set() for loc in self.map.locs} # Game is completed @@ -4583,157 +4573,164 @@ class Game(Jsonable): except (IndexError, KeyError): return f"[_generate_phase_summary] No GamePhaseData found for {phase_key}" - # Log the current phase key, results, and possibly the orders for debugging - logging.debug( - "DEBUG _generate_phase_summary: current phase_key=%s, results=%s, orders=%s", - phase_key, - current_phase_data.results, - current_phase_data.orders - ) - - # Retrieve the list of all recorded phase keys - all_phases = list(self.state_history.keys()) - logging.debug("DEBUG _generate_phase_summary: all_phases=%s", all_phases) - + # 2) Attempt to retrieve the PREVIOUS phase data to highlight differences + # We'll do this by checking the index of `phase_key` in `self.state_history`. + # If there's a previous index, we'll fetch that phase_data for comparison. prev_phase_data = None + all_phases = list(self.state_history.keys()) if str(phase_key) in all_phases: idx = all_phases.index(str(phase_key)) - logging.debug("DEBUG _generate_phase_summary: current phase index=%d", idx) - - # Here we log the logic behind picking the previous phase if idx > 0: - prev_phase_key = all_phases[idx - 1] - logging.debug( - "DEBUG _generate_phase_summary: Using prev_phase_key=%s (idx-1).", - prev_phase_key - ) + prev_phase_key = all_phases[idx - 1] try: prev_phase_data = self.get_phase_from_history(prev_phase_key) - except Exception as e: - logging.debug("DEBUG _generate_phase_summary: Could not get prev_phase_data for key=%s, error=%s", prev_phase_key, e) - else: - logging.debug("DEBUG _generate_phase_summary: Not enough phases to set prev_phase_key.") - else: - logging.debug("DEBUG _generate_phase_summary: phase_key=%s not in all_phases!", phase_key) + except: + pass - if prev_phase_data: - logging.debug( - "DEBUG _generate_phase_summary: Found prev_phase_data for key=%s, results=%s, orders=%s", - prev_phase_key, - prev_phase_data.results, - prev_phase_data.orders - ) - - # Get current and previous state data + # 3) Gather the big data from current_phase_data + # (We assume you have stored them in current_phase_data.state the usual way.) cur_state = current_phase_data.state - logging.debug("DEBUG _generate_phase_summary: cur_state keys=%s", list(cur_state.keys())) - cur_orders_dict = current_phase_data.orders - cur_results_dict = current_phase_data.results + # Typically these keys exist if your get_state() populates them: + cur_units = cur_state.get('units', {}) + cur_centers = cur_state.get('centers', {}) + cur_retreats = cur_state.get('retreats', {}) + cur_homes = cur_state.get('homes', {}) + cur_influence = cur_state.get('influence', {}) + cur_cd = cur_state.get('civil_disorder', {}) - # Build the differences info - differences_info = [] + cur_orders_dict = current_phase_data.orders # {power_name: list_of_orders} + cur_results_dict = current_phase_data.results # {unit_name: list_of_outcomes} + + # 4) If we have a previous phase, gather the old state's data so we can do some diffs + prev_units = prev_centers = prev_retreats = prev_homes = prev_influence = prev_cd = {} if prev_phase_data: prev_state = prev_phase_data.state - - for power in cur_state['units'].keys(): - # Units difference - old_units = set(prev_state.get('units', {}).get(power, [])) - new_units = set(cur_state.get('units', {}).get(power, [])) + prev_units = prev_state.get('units', {}) + prev_centers = prev_state.get('centers', {}) + prev_retreats= prev_state.get('retreats', {}) + prev_homes = prev_state.get('homes', {}) + prev_influence= prev_state.get('influence', {}) + prev_cd = prev_state.get('civil_disorder', {}) + + # 5) Build a user prompt. We can do it in sections: + + # 5a) Orders: + orders_text = [] + for power, orders in cur_orders_dict.items(): + if orders: + orders_text.append(f"{power} => {', '.join(orders)}") + else: + orders_text.append(f"{power} => [No orders]") + orders_block = "\n".join(orders_text) if orders_text else "[No orders found]" + + # 5b) Results: + results_text = [] + for unit_name, outcomes in cur_results_dict.items(): + # old code: results_text.append(f"{unit_name}: {', '.join(outcomes)}") + outcome_strs = [str(item) for item in outcomes] + results_text.append(f"{unit_name}: {', '.join(outcome_strs)}") + + results_block = "\n".join(results_text) if results_text else "[No results found]" + # 5c) Current state (units, centers, etc.) - all powers + # We'll just do a short textual listing. You can format it more carefully as you see fit. + def dict_of_lists_to_str(title, dct): + # Helper to turn e.g. {"FRANCE": ["A MAR", "F BRE"], "ENGLAND": ["A LVP"]} into lines + lines = [] + for key, val in dct.items(): + lines.append(f" {key}: {val}") + return f"{title}:\n" + "\n".join(lines) if lines else f"{title}: [None]" + + current_state_text = [] + current_state_text.append(dict_of_lists_to_str("Units", cur_units)) + current_state_text.append(dict_of_lists_to_str("Centers", cur_centers)) + current_state_text.append(dict_of_lists_to_str("Retreats",cur_retreats)) + current_state_text.append(dict_of_lists_to_str("Homes", cur_homes)) + current_state_text.append(dict_of_lists_to_str("Influence", cur_influence)) + current_state_text.append(dict_of_lists_to_str("Civil Disorder", cur_cd)) + current_state_block = "\n\n".join(current_state_text) + + # 5d) Differences from previous (if any) + # We'll do an extremely simple approach: check if the set of items changed in each dict. + # This is purely an example. You can do more advanced diff logic if you want. + + differences_info = [] + if prev_phase_data: + # For each of units, centers, etc. do a quick set compare for each power + # We'll focus on e.g. newly acquired centers, newly lost centers, etc. + for power in cur_units.keys(): + # (1) Units difference: + old_units = set(prev_units.get(power, [])) + new_units = set(cur_units.get(power, [])) if old_units != new_units: gained = new_units - old_units - lost = old_units - new_units + lost = old_units - new_units if gained: differences_info.append(f"{power} gained units: {list(gained)}") if lost: - differences_info.append(f"{power} lost units: {list(lost)}") + differences_info.append(f"{power} lost units: {list(lost)}") - # Centers difference - old_centers = set(prev_state.get('centers', {}).get(power, [])) - new_centers = set(cur_state.get('centers', {}).get(power, [])) + # (2) Centers difference: + old_centers = set(prev_centers.get(power, [])) + new_centers = set(cur_centers.get(power, [])) if old_centers != new_centers: gained = new_centers - old_centers - lost = old_centers - new_centers + lost = old_centers - new_centers if gained: differences_info.append(f"{power} gained centers: {list(gained)}") if lost: - differences_info.append(f"{power} lost centers: {list(lost)}") + differences_info.append(f"{power} lost centers: {list(lost)}") + + # You can do the same for retreats, homes, influence, etc. if you want, + # or just skip them. We'll skip for brevity here. else: - differences_info.append("Initial phase - no previous state to compare.") + differences_info.append("No previous phase data found, so no direct diffs to report.") - differences_block = "\n".join(differences_info) or "[No significant changes detected]" + differences_block = "\n".join(differences_info) or "[No changes detected from previous phase]" - # Build the prompt focusing only on key changes + # 5e) Put it all together in the final user prompt for the LLM: user_prompt = ( f"PHASE SUMMARY REQUEST.\n\n" f"PHASE: {phase_key}\n\n" - f"ORDERS:\n{', '.join(f'{power}: {orders}' for power, orders in cur_orders_dict.items())}\n\n" - f"RESULTS:\n{', '.join(f'{unit}: {results}' for unit, results in cur_results_dict.items())}\n\n" - f"KEY CHANGES:\n{differences_block}\n\n" - "Please create a JSON summary explaining:\n" - "- Each successful move\n" - "- Each bounce or voided order, with reasons\n" - "- Key changes in supply centers\n" - "- Potential strategic ramifications\n\n" - "PARSABLE OUTPUT:\n" - "{\n" - "'summary': ... your text ...\n" - "}" + f"ORDERS:\n{orders_block}\n\n" + f"RESULTS:\n{results_block}\n\n" + f"CURRENT BOARD STATE:\n{current_state_block}\n\n" + f"CHANGES FROM PREVIOUS PHASE:\n{differences_block}\n\n" + "Below is the final board state after the latest phase, along with the moves each power submitted and the engine’s adjudication results. Please create a summary in JSON, explaining:" + "- Each successful move," + "- Each bounce or voided order, with reasons (e.g. equal force, no valid route, contradictory support)," + "- Key changes in supply centers," + "- Potential strategic ramifications if relevant." + + "Return ONLY JSON:" + + "PARSABLE OUTPUT:" + "{{" + "'summary': ... your text ..." + "}}" ) + # We might also have a system prompt to guide the AI, e.g.: system_prompt = ( - "You are a Diplomacy expert summarizing phase results.\n" - "Focus on:\n" - "1) Key board changes\n" - "2) Failed orders and their reasons\n" - "3) Successful moves affecting centers\n\n" - """ - 1. Understanding the Phases & Their Orders + """ + You are a Diplomacy expert, summarizing the results of the latest phase. + Your tasks: + 1) Provide a concise summary of how the board changed. + 2) Specifically list each voided or bounced order, and *why* it occurred. + 3) If possible, describe which moves or supports succeeded and how that affected centers. - 1.1. Movement Phase (phase_type == 'M') - • Hold: A PAR H (Army in Paris does nothing) - • Move: A PAR - BUR (Army in Paris moves to Burgundy) - • Support: - • Support Hold: A MAR S A PAR H (Army in Marseilles supports Army in Paris to hold) - • Support Move: A MAR S A PAR - BUR (Army in Marseilles supports Army in Paris moving to Burgundy) - • Convoy: Fleets at sea can convoy an Army over water: - • Fleet Convoy: F ION C A TUN - NAP (Fleet in Ionian Sea convoys Army from Tunis to Naples) - • Army Move via Convoy: A TUN - NAP VIA (explicitly states the Army is moving from Tunis to Naples via convoy) + Format: + - Must return a JSON with the top-level key "summary" or "orders" or similar. + - Possibly: - 1.2. Retreat Phase (phase_type == 'R') - • If a unit is dislodged, it must Retreat or Disband: - • Retreat: A BUR R PIC (Dislodged Army in Burgundy retreats to Picardy) - • Disband: A BUR D (Army in Burgundy disbands, if it cannot retreat or chooses not to) + PARSABLE OUTPUT: + { + "summary": "...(your textual summary)..." + } - 1.3. Adjustment Phase (phase_type == 'A') - • Build new units if you have more centers than current units: - • A PAR B (Build an Army in Paris) - • F MAR B (Build a Fleet in Marseilles) - • Remove units if you have fewer centers than current units: - • A BUR D (Disband Army in Burgundy) - • Waive a build if you have a surplus but don’t want/can’t build: - • WAIVE (no unit is built in the available build location) + Ensure the summary clarifies reasons for bounces, e.g., "F TRI -> VEN bounced because Italy also moved A VEN -> TRI with equal force." - 1.4. Order Types - • H (Hold) – e.g. A PAR H - • - (Move) – e.g. A PAR - BUR - • S (Support) – e.g. A MAR S A PAR - BUR or A MAR S A PAR H - • C (Convoy) – e.g. F ION C A TUN - NAP - • R (Retreat) – e.g. A BUR R PIC - • D (Disband) – e.g. A BUR D - • B (Build) – e.g. A PAR B - • WAIVE – skipping a possible build - - 1.5. Key Phase Context - • Movement (M): Units can H, -, S, C. - • Retreat (R): Dislodged units can only R or D. - • Adjustment (A): Build/Remove units or WAIVE. - • Multi-Coast: For SPA, STP, BUL, specify nc, sc, or ec when using Fleets, e.g. F BRE - SPA(sc). - • Basic Validity Rules - • No self-support (A PAR S A PAR - BUR is invalid). - • Fleets must be on water to convoy. - • Army “- X VIA” must have one or more fleets issuing matching C A ... - X. - """ - "Example: 'F TRI -> VEN bounced due to equal force from Italy's A VEN -> TRI'" + No extra text outside the JSON block. + """ ) if summary_callback: @@ -4741,7 +4738,7 @@ class Game(Jsonable): else: summary_text = "(No LLM callback provided.)" - # Store the summary + # 7) Store the text in the current GamePhaseData and in self.phase_summaries current_phase_data.summary = summary_text self.phase_summaries[str(phase_key)] = summary_text diff --git a/lm_game.py b/lm_game.py index 9df181f..bd61bd3 100644 --- a/lm_game.py +++ b/lm_game.py @@ -13,7 +13,7 @@ os.environ["GRPC_PYTHON_LOG_LEVEL"] = "40" # ERROR level only from diplomacy import Game from diplomacy.utils.export import to_saved_game_format -from ai_diplomacy.model_loader import load_model_client +from ai_diplomacy.clients import load_model_client from ai_diplomacy.utils import ( get_valid_orders, gather_possible_orders, @@ -22,12 +22,9 @@ from ai_diplomacy.utils import ( from ai_diplomacy.negotiations import conduct_negotiations from ai_diplomacy.planning import planning_phase from ai_diplomacy.game_history import GameHistory -from ai_diplomacy.long_story_short import configure_context_manager -from ai_diplomacy.clients import configure_logging dotenv.load_dotenv() -# Configure logger with a consistent format logger = logging.getLogger(__name__) logging.basicConfig( level=logging.INFO, @@ -35,18 +32,6 @@ logging.basicConfig( datefmt="%H:%M:%S", ) -# Configure specific loggers to reduce noise -logging.getLogger("httpx").setLevel(logging.WARNING) -logging.getLogger("httpcore").setLevel(logging.WARNING) -logging.getLogger("urllib3").setLevel(logging.WARNING) -logging.getLogger("anthropic").setLevel(logging.WARNING) -logging.getLogger("openai").setLevel(logging.WARNING) - -# Ensure our application loggers are at appropriate levels -logging.getLogger("client").setLevel(logging.INFO) -logging.getLogger("ai_diplomacy").setLevel(logging.INFO) - - def parse_arguments(): parser = argparse.ArgumentParser( @@ -55,13 +40,13 @@ def parse_arguments(): parser.add_argument( "--max_year", type=int, - default=1910, + default=1901, help="Maximum year to simulate. The game will stop once this year is reached.", ) parser.add_argument( "--num_negotiation_rounds", type=int, - default=5, + default=0, help="Number of negotiation rounds per phase.", ) parser.add_argument( @@ -85,65 +70,15 @@ def parse_arguments(): help="Enable the planning phase for each power to set strategic directives.", ) return parser.parse_args() - - -def save_game_state(game, result_folder, game_file_path, model_error_stats, args, is_final=False): - """ - Save the current game state and related information - - Args: - game: The diplomacy game instance - result_folder: Path to the results folder - game_file_path: Base path for the game file - model_error_stats: Dictionary containing model error statistics - args: Command line arguments - is_final: Boolean indicating if this is the final save - """ - # Generate unique filename for periodic saves - timestamp = time.strftime("%Y%m%d_%H%M%S") - if not is_final: - output_path = f"{game_file_path}_checkpoint_{timestamp}.json" - else: - output_path = game_file_path - # If final file exists, append timestamp - if os.path.exists(output_path): - logger.info("STORAGE | Final game file already exists, saving with unique timestamp") - output_path = f"{output_path}_{timestamp}.json" - - # Save game state - to_saved_game_format(game, output_path=output_path) - - # Save overview data - overview_file_path = f"{result_folder}/overview.jsonl" - with open(overview_file_path, "w") as overview_file: - overview_file.write(json.dumps(model_error_stats) + "\n") - overview_file.write(json.dumps(game.power_model_map) + "\n") - overview_file.write(json.dumps(vars(args)) + "\n") - - logger.info(f"STORAGE | Game checkpoint saved to: {output_path}") def main(): args = parse_arguments() - - # Configure logging - log_level = getattr(logging, args.log_level) - configure_logging( - log_full_prompts=args.log_full_prompts, - log_full_responses=args.log_full_responses, - suppress_connection_logs=not args.verbose, - log_level=log_level - ) - - # Configure the context manager with the same summary model - configure_context_manager( - phase_threshold=15000, - message_threshold=15000, - summary_model=args.summary_model - ) max_year = args.max_year - logger.info("GAME_START | Initializing Diplomacy game with multiple LLM agents") + logger.info( + "Starting a new Diplomacy game for testing with multiple LLMs, now concurrent!" + ) start_whole = time.time() model_error_stats = defaultdict( @@ -163,26 +98,6 @@ def main(): result_folder = f"./results/{timestamp_str}" os.makedirs(result_folder, exist_ok=True) - # --------------------------- - # ADD FILE HANDLER FOR LOGS - # --------------------------- - log_file_path = os.path.join(result_folder, "game.log") - file_handler = logging.FileHandler(log_file_path) - file_handler.setLevel(logging.DEBUG) # Ensure we capture all levels in the file - file_handler.setFormatter( - logging.Formatter("%(asctime)s [%(levelname)s] %(name)s - %(message)s", datefmt="%H:%M:%S") - ) - - # Add the handler to root logger to capture all modules' logs - logging.getLogger().addHandler(file_handler) - - # Also add to specific loggers we care about most for summarization - logging.getLogger("ai_diplomacy.long_story_short").addHandler(file_handler) - logging.getLogger("ai_diplomacy.long_story_short").setLevel(logging.DEBUG) - - logger.info(f"LOGGING | File handler configured to write logs to {log_file_path}") - logger.info(f"LOGGING | Capturing detailed context management logs at DEBUG level") - # File paths manifesto_path = f"{result_folder}/game_manifesto.txt" # Use provided output filename or generate one based on the timestamp @@ -204,46 +119,33 @@ def main(): provided_models = [name.strip() for name in args.models.split(",")] if len(provided_models) != len(powers_order): logger.error( - f"CONFIG_ERROR | Expected {len(powers_order)} models in --models argument but got {len(provided_models)}. Exiting." + f"Expected {len(powers_order)} models for --power-models but got {len(provided_models)}. Exiting." ) return game.power_model_map = dict(zip(powers_order, provided_models)) else: - game.power_model_map = assign_models_to_powers(randomize=True) - - logger.debug("POWERS | Model assignments:") - for power, model_id in game.power_model_map.items(): - logger.debug(f"POWERS | {power} assigned to {model_id}") - - # Also, if you prefer to fix the negotiation function: - # We could do a one-liner ensuring all model_id are strings: - for p in game.power_model_map: - if not isinstance(game.power_model_map[p], str): - game.power_model_map[p] = str(game.power_model_map[p]) - - logger.debug("POWERS | Verified all power model IDs are strings") - - round_counter = 0 # Track number of rounds + game.power_model_map = assign_models_to_powers() while not game.is_game_done: phase_start = time.time() current_phase = game.get_current_phase() logger.info( - f"PHASE | {current_phase} | Starting (elapsed game time: {phase_start - start_whole:.2f}s)" + f"PHASE: {current_phase} (time so far: {phase_start - start_whole:.2f}s)" ) - # Get the current short phase - logger.debug(f"PHASE | Current short phase: '{game.current_short_phase}'") + # DEBUG: Print the short phase to confirm + logger.info(f"DEBUG: current_short_phase is '{game.current_short_phase}'") # Prevent unbounded simulation based on year year_str = current_phase[1:5] year_int = int(year_str) if year_int > max_year: - logger.info(f"GAME_END | Reached year limit ({year_int} > {max_year}), terminating game") + logger.info(f"Reached year {year_int}, stopping the test game early.") break # If it's a movement phase (e.g. ends with "M"), conduct negotiations if game.current_short_phase.endswith("M"): + if args.planning_phase: logger.info("Starting planning phase block...") game_history = planning_phase( @@ -258,14 +160,13 @@ def main(): model_error_stats, max_rounds=args.num_negotiation_rounds, ) + # Gather orders from each power concurrently active_powers = [ (p_name, p_obj) for p_name, p_obj in game.powers.items() if not p_obj.is_eliminated() ] - - logger.info(f"ORDERS | {current_phase} | Requesting orders from {len(active_powers)} active powers") with concurrent.futures.ThreadPoolExecutor( max_workers=len(active_powers) @@ -273,10 +174,10 @@ def main(): futures = {} for power_name, _ in active_powers: model_id = game.power_model_map.get(power_name, "o3-mini") - client = load_model_client(model_id, power_name=power_name) + client = load_model_client(model_id) possible_orders = gather_possible_orders(game, power_name) if not possible_orders: - logger.info(f"ORDERS | {power_name} | No orderable locations, skipping") + logger.info(f"No orderable locations for {power_name}; skipping.") continue board_state = game.get_state() @@ -288,24 +189,26 @@ def main(): power_name, possible_orders, game_history, - game.phase_summaries, model_error_stats, ) futures[future] = power_name - logger.debug(f"ORDERS | {power_name} | Requested orders from {model_id}") + logger.debug(f"Submitted get_valid_orders task for {power_name}.") for future in concurrent.futures.as_completed(futures): p_name = futures[future] try: orders = future.result() + logger.debug(f"Validated orders for {p_name}: {orders}") if orders: - logger.debug(f"ORDERS | {p_name} | Received {len(orders)} valid orders") game.set_orders(p_name, orders) - logger.debug(f"ORDERS | {p_name} | Orders set for {game.current_short_phase}") + logger.debug( + f"Set orders for {p_name} in {game.current_short_phase}: {orders}" + ) else: - logger.warning(f"ORDERS | {p_name} | No valid orders returned") + logger.debug(f"No valid orders returned for {p_name}.") except Exception as exc: - logger.error(f"ORDERS | {p_name} | Request failed: {str(exc)[:150]}") + logger.error(f"LLM request failed for {p_name}: {exc}") + logger.info("Processing orders...\n") game.process() # Add orders to game history @@ -328,7 +231,8 @@ def main(): game.order_history[current_phase][power_name], results, ) - logger.info(f"PROCESSING | {current_phase} | Phase completed") + logger.info("Phase complete.\n") + # Append the strategic directives to the manifesto file strategic_directives = game_history.get_strategic_directives() if strategic_directives: @@ -343,17 +247,28 @@ def main(): year_str = current_phase[1:5] year_int = int(year_str) if year_int > max_year: - logger.info(f"GAME_END | Reached year limit ({year_int} > {max_year}), terminating game") + logger.info(f"Reached year {year_int}, stopping the test game early.") break # Save final result duration = time.time() - start_whole - logger.info(f"GAME_END | Duration: {duration:.2f}s | Saving final state") - - save_game_state(game, result_folder, game_file_path, model_error_stats, args, is_final=True) - - logger.info(f"STORAGE | Game data saved in: {result_folder}") - logger.info("GAME_END | Simulation complete") + logger.info(f"Game ended after {duration:.2f}s. Saving to final JSON...") + + output_path = game_file_path + # If the file already exists, append a timestamp to the filename + if os.path.exists(output_path): + logger.info("Game file already exists, saving with unique filename.") + output_path = f"{output_path}_{time.strftime('%Y%m%d_%H%M%S')}.json" + to_saved_game_format(game, output_path=output_path) + + # Dump error stats and power model mapping to the overview file + with open(overview_file_path, "w") as overview_file: + overview_file.write(json.dumps(model_error_stats) + "\n") + overview_file.write(json.dumps(game.power_model_map) + "\n") + overview_file.write(json.dumps(vars(args)) + "\n") + + logger.info(f"Saved game data, manifesto, and error stats in: {result_folder}") + logger.info("Done.") if __name__ == "__main__": diff --git a/plotting.ipynb b/plotting.ipynb index 8ef9f7b..86530dc 100644 --- a/plotting.ipynb +++ b/plotting.ipynb @@ -87,7 +87,7 @@ "\n", "# Plot unit counts per country\n", "for country in countries:\n", - " axs[0].plot(turns, unit_counts[country], label=f\"{model_map[country]} ({country})\")\n", + " axs[0].plot(turns, unit_counts[country], label=model_map[country])\n", "axs[0].set_title(\"Unit Counts per Country Over Turns\")\n", "axs[0].set_ylabel(\"Number of Units\")\n", "axs[0].set_xlabel(\"Turns\")\n", @@ -97,7 +97,7 @@ "\n", "# Plot center counts per country\n", "for country in countries:\n", - " axs[1].plot(turns, center_counts[country], label=f\"{model_map[country]} ({country})\")\n", + " axs[1].plot(turns, center_counts[country], label=model_map[country])\n", "axs[1].set_title(\"Center Counts per Country Over Turns\")\n", "axs[1].set_ylabel(\"Number of Centers\")\n", "axs[1].set_xlabel(\"Turns\")\n",