diff --git a/README.md b/README.md index 9e867e3..9d9513d 100644 --- a/README.md +++ b/README.md @@ -7,6 +7,7 @@ This repository is an extension of the original [Diplomacy](https://github.com/d - **Conversation & Negotiation**: Powers can have multi-turn negotiations with each other via `lm_game.py`. They can exchange private or global messages, allowing for more interactive diplomacy. - **Order Generation**: Each power can choose its orders (moves, holds, supports, etc.) using LLMs via `lm_service_versus.py`. Currently supports OpenAI, Claude, Gemini, DeepSeek - **Phase Summaries**: Modifications in the `game.py` engine allow the generation of "phase summaries," providing a succinct recap of each turn's events. This could help both human spectators and the LLMs themselves to understand the game state more easily. +- **Agent State Architecture**: Powers are represented by DiplomacyAgent instances that maintain goals, relationships, and a journal tracking thoughts and decisions. This stateful design allows for more consistent and strategic play. - **Prompt Templates**: Prompts used by the LLMs are stored in `/prompts/`. You can edit these to customize how models are instructed for both orders and conversations. - **Experimental & WIP**: Ongoing development includes adding strategic goals for each power, more flexible conversation lengths, and a readiness check to advance the phase if all powers are done negotiating. @@ -17,21 +18,28 @@ This repository is an extension of the original [Diplomacy](https://github.com/d - Manages conversation rounds (currently up to 3 by default) and calls `get_conversation_reply()` for each power. - After negotiations, each power's orders are gathered concurrently (via threads), using `get_orders()` from the respective LLM client. - Calls `game.process()` to move to the next phase, optionally collecting phase summaries along the way. + - Updates agent state after each phase to maintain continuity and strategic direction. 2. **`lm_service_versus.py`** - Defines a base class (`BaseModelClient`) for hitting any LLM endpoint. - Subclasses (`OpenAIClient`, `ClaudeClient`, etc.) implement `generate_response()` and `get_conversation_reply()` with the specifics of each LLM's API. - Handles prompt construction for orders and conversation, JSON extraction to parse moves or messages, and fallback logic for invalid LLM responses. -3. **Modifications in `game.py` (Engine)** +3. **`agent.py`** + - Implements the DiplomacyAgent class that maintains state for each power. + - Tracks goals, relationships with other powers, and a private journal of thoughts. + - Provides robust JSON parsing for LLM responses with case-insensitive validation. + - Updates goals and relationships based on game events to maintain coherent strategies. + +4. **Modifications in `game.py` (Engine)** - Added a `_generate_phase_summary()` method and `phase_summaries` dict to store short textual recaps of each phase. - Summaries can be viewed or repurposed for real-time commentary or as additional context fed back into the LLM. ### Future Explorations - **Longer Conversation Phases**: Support for more than 3 message rounds, or an adaptive approach that ends negotiation early if all powers signal "ready." -- **Strategic Goals**: Let each power maintain high-level goals (e.g., "ally with France," "defend Munich") that the LLM takes into account for orders and conversations. -- **Enhanced Summaries**: Summaries could incorporate conversation logs or trending alliances, giving the LLM even richer context each turn. +- **Enhanced Agent Memory**: Further develop agent memory and learning from past interactions to influence future decisions. +- **Strategic Map Analysis**: Leverage the map graph structure to help agents make better tactical decisions. - **Live Front-End Integration**: Display phase summaries, conversation logs, and highlights of completed orders in a real-time UI. (an attempt to display phase summaries currently in progress) --- diff --git a/ai_animation/CLAUDE.md b/ai_animation/CLAUDE.md index c6df903..4b99f19 100644 --- a/ai_animation/CLAUDE.md +++ b/ai_animation/CLAUDE.md @@ -29,6 +29,35 @@ - When animations complete, show phase summary (if available) - Advance to next phase and repeat +## Agent State Display +The game now includes agent state data that can be visualized: + +1. **Goals and Relationships**: Each power has strategic goals and relationships with other powers +2. **Journal Entries**: Internal thoughts that help explain decision making + +### JSON Format Expectations: +- Agent state is stored in the game JSON with the following structure: + ```json + { + "powers": { + "FRANCE": { + "goals": ["Secure Belgium", "Form alliance with Italy"], + "relationships": { + "GERMANY": "Enemy", + "ITALY": "Ally", + "ENGLAND": "Neutral", + "AUSTRIA": "Neutral", + "RUSSIA": "Unfriendly", + "TURKEY": "Neutral" + }, + "journal": ["Suspicious of England's fleet movements"] + } + } + } + ``` +- Relationship status must be one of: "Enemy", "Unfriendly", "Neutral", "Friendly", "Ally" +- The code handles case variations but the display should normalize to title case + ## Known Issues - Text-to-speech requires an ElevenLabs API key in `.env` file - Unit animations sometimes don't fire properly after messages diff --git a/ai_diplomacy/llms.txt b/ai_diplomacy/llms.txt index bcae334..40c0494 100644 --- a/ai_diplomacy/llms.txt +++ b/ai_diplomacy/llms.txt @@ -33,11 +33,9 @@ This document provides an analysis of key Python modules within the `ai_diplomac * Unsuccessful moves by power with failure reasons * Optional sections for other move types -### PARTIALLY IMPLEMENTED MODULES: - -#### 1.4. `agent.py` (PARTIAL) +#### 1.4. `agent.py` (COMPLETE) **Goal:** To maintain stateful agent representation with personality, goals, and relationships. -**Status:** Base class implemented but not fully integrated into planning/negotiation workflows. +**Status:** Fully implemented and integrated with planning/negotiation workflows. **Key Components:** * `DiplomacyAgent` class with: @@ -46,26 +44,40 @@ This document provides an analysis of key Python modules within the `ai_diplomac * `goals`: List of strategic goals * `relationships`: Dict of relationships with other powers * `private_journal`: List of internal thoughts/reflections + * `_extract_json_from_text`: Robust JSON extraction from LLM responses + * `initialize_agent_state`: Sets initial goals and relationships + * `analyze_phase_and_update_state`: Updates goals and relationships based on game events * Methods for plan generation, updating goals, and updating relationships -**Integration Points Needed:** -* Connect agent state to context generation in `clients.py` -* Define how personality affects planning and negotiations -* Remove redundant order generation logic +**Integration Points:** +* Connected to context generation in `clients.py` +* Influences planning and negotiations through goals and relationships +* Case-insensitive validation of LLM-provided power names and relationship statuses +* Robust error recovery with fallback defaults when LLM responses fail to parse -#### 1.5. `negotiations.py` (PARTIAL) +#### 1.5. `negotiations.py` (COMPLETE) **Goal:** To orchestrate the communication phase among active AI powers. -**Status:** Works but needs integration with DiplomacyAgent state. +**Status:** Fully implemented and integrated with DiplomacyAgent state. -#### 1.6. `planning.py` (PARTIAL) +#### 1.6. `planning.py` (COMPLETE) **Goal:** To allow each AI power to generate a high-level strategic directive or plan. -**Status:** Works but needs integration with DiplomacyAgent state and map analysis. +**Status:** Fully implemented and integrated with DiplomacyAgent state. #### 1.7. `utils.py` (COMPLETE) **Goal:** To provide common utility functions used across other AI diplomacy modules. **Status:** Fully implemented. -#### 1.8. `clients.py` (COMPLETE BUT NEEDS EXTENSION) +#### 1.8. `clients.py` (COMPLETE) +**Goal:** To abstract and manage interactions with various LLM APIs. +**Status:** Fully implemented with agent state integration. + +### PARTIALLY IMPLEMENTED MODULES: + +#### 1.9. `utils.py` (COMPLETE) +**Goal:** To provide common utility functions used across other AI diplomacy modules. +**Status:** Fully implemented. + +#### 1.10. `clients.py` (COMPLETE BUT NEEDS EXTENSION) **Goal:** To abstract and manage interactions with various LLM APIs. **Status:** Works, but needs extension to incorporate agent state into context. @@ -73,41 +85,44 @@ This document provides an analysis of key Python modules within the `ai_diplomac ## 2. Integration Points -The following connections need to be established: +The following connections have been established: 1. **Agent State → Context Building** - * `BaseModelClient.build_context_prompt` needs to incorporate agent's personality, goals, and relationships - * Modify `context_prompt.txt` to include sections for agent state + * `BaseModelClient.build_context_prompt` incorporates agent's personality, goals, and relationships + * Modified prompt templates include sections for agent state -2. **Map Analysis → Planning** - * Use `DiplomacyGraph` and BFS search in `planning_phase` to identify strategic options - * Incorporate territory accessibility analysis into strategic planning +2. **Agent State → Negotiations** + * Agent's personality, goals, and relationships influence message generation + * Relationships are updated based on negotiation context and results -3. **Agent State → Negotiations** - * Have agent's personality, goals, and relationships influence message generation - * Update relationships based on negotiation context and results +3. **Robust LLM Interaction** + * Implemented multi-strategy JSON extraction to handle various LLM response formats + * Added case-insensitive validation for power names and relationship statuses + * Created fallback mechanisms for all LLM interactions -4. **Map Analysis → Order Generation** - * Incorporate BFS search to help identify optimal movements and support actions - * Analyze territory adjacency for attack planning +4. **Error Recovery** + * Added defensive programming throughout agent state updates + * Implemented progressive fallback strategies for parsing LLM outputs + * Used intelligent defaults to maintain consistent agent state --- -## 3. Next Steps (Implementation Plan) +## 3. Future Work -1. **Phase 1: Agent Integration** - * Enhance `BaseModelClient.build_context_prompt` to include agent state - * Update prompt templates to utilize agent information - * Add a regularly updating 'relationships' section to the context prompt - * Add a regularly updating 'goals' section to the context prompt - * Add a regularly updating 'journal' section to the context prompt - -2. **Phase 2: Map Analysis Integration** +1. **Map Analysis Integration** * Create utility functions to leverage BFS search for common strategic questions * Integrate these into planning phase * Add territory analysis to order generation context -3. **Phase 3: Test! +2. **Enhanced Agent Adaptation** + * Develop more sophisticated goal updating strategies based on game events + * Implement memory of betrayals/alliances across multiple phases + * Create feedback loops between relationship states and planning priorities + +3. **UI Integration** + * Expose agent states (goals, relationships) in the game visualization + * Show evolving relationships between powers graphically + * Integrate agent journal entries as commentary --- @@ -131,6 +146,7 @@ The following connections need to be established: ``` **Current Integration Status:** -* `agent.py` and `map_utils.py` are implemented but not fully integrated with other modules -* `phase_summary_callback` works in `lm_game.py` but is not integrated with agent state -* Base message passing and planning works, but doesn't leverage agent personality or map analysis +* `agent.py` is fully implemented and integrated with other modules +* State updates work reliably between phases +* Robust JSON parsing and case-insensitive validation ensure smooth operation +* `map_utils.py` is implemented but not yet fully leveraged for strategic planning diff --git a/experiment_log.md b/experiment_log.md index 0a81da0..b425cfd 100644 --- a/experiment_log.md +++ b/experiment_log.md @@ -42,6 +42,7 @@ | 5 | `TypeError` in `add_journal_entry` (wrong args); `JSONDecodeError` (LLM added extra text/markdown fences) | Fix args; Robust JSON parse | Partial Success* | -$100 | | 6 | `TypeError: wrong number of args` for state update call. | Helper fn; Sync loop; Fix | Failure | -$100 | | 7 | `AttributeError: 'Game' has no attribute 'get_board_state_str'/'current_year'` and JSON key mismatch | Create board_state_str from board_state; Extract year from phase name; Fix JSON key mismatches | Partial Success** | -$100 | +| 8 | Case-sensitivity issues - power names in relationships not matching ALL_POWERS | Made relationship validation case-insensitive; Reduced log verbosity | Success | +$500 | *Partial Success: Game ran 1 year, but failed during state update phase. **Partial Success: Game runs without crashing, but LLM responses still don't match expected JSON format. @@ -59,7 +60,44 @@ **Observation:** Game now runs without crashing through basic state updates, but LLM responses don't use the expected JSON keys (they use "relationships"/"goals" while code expects "updated_relationships"/"updated_goals"). -**Next Steps:** Fix the JSON key mismatch by either: -1. Updating the state_update_prompt.txt to use "updated_goals" and "updated_relationships", or -2. Modifying the agent.py code to look for "goals" and "relationships" keys and map them to the expected variables. +## Experiment 8: Case-Insensitivity Fix + +**Date:** 2025-04-08 +**Goal:** Fix case-sensitivity issues in relationship validation and key name mismatches. +**Changes:** +1. Added case-insensitive validation for power names (e.g., "Austria" → "AUSTRIA") +2. Added case-insensitive validation for relationship statuses (e.g., "enemy" → "Enemy") +3. Made the code look for alternative JSON key names ("goals"/"relationships" vs "updated_goals"/"updated_relationships") +4. Reduced log noise by only showing first few validation warnings and a summary count for the rest +5. Added fallback defaults in all error cases to ensure agent state is never empty + +**Observation:** Game now runs successfully through multiple phases. The agent state is properly updated and maintained between phases. Logs are cleaner and more informative. + +**Result:** Success (+$500, successfully running through all phases) + +--- + +## Key Learnings & Best Practices + +1. **Strong Defensive Programming** + - Always implement fallback values when parsing LLM outputs + - Use robust JSON extraction with multiple strategies (regex patterns, string cleaning) + - Never assume case-sensitivity in LLM outputs - normalize all strings + +2. **Adaptable Input Parsing** + - Accept multiple key names for the same concept ("goals" vs "updated_goals") + - Adopt flexible parsing approaches that can handle structural variations + - Have clear default behaviors defined when expected data is missing + +3. **Effective Logging** + - Use debug logs liberally during development phases + - Keep production logs high-signal and low-noise by limiting repeat warnings + - Include contextual information in logs (power name, phase name) for easier debugging + +4. **Robust Error Recovery** + - Implement progressive fallback strategies: try parsing → try alternate formats → use defaults + - Maintain coherent state even when errors occur - never leave agent in partial/invalid state + - When unexpected errors occur, recover gracefully rather than crashing + +These learnings have significantly improved the Agent architecture's reliability and are applicable to other LLM-integration contexts.