iterating

2026-04-19 12:58:09 +00:00 · 2025-04-13 12:12:11 -07:00 · 2025-04-13 12:12:11 -07:00 · 65f287df84
commit 65f287df84
parent 6e5079fa02
4 changed files with 135 additions and 44 deletions
--- a/README.md
+++ b/README.md
@ -7,6 +7,7 @@ This repository is an extension of the original [Diplomacy](https://github.com/d
 - **Conversation & Negotiation**: Powers can have multi-turn negotiations with each other via `lm_game.py`. They can exchange private or global messages, allowing for more interactive diplomacy.  
 - **Order Generation**: Each power can choose its orders (moves, holds, supports, etc.) using LLMs via `lm_service_versus.py`. Currently supports OpenAI, Claude, Gemini, DeepSeek
 - **Phase Summaries**: Modifications in the `game.py` engine allow the generation of "phase summaries," providing a succinct recap of each turn's events. This could help both human spectators and the LLMs themselves to understand the game state more easily.  
+- **Agent State Architecture**: Powers are represented by DiplomacyAgent instances that maintain goals, relationships, and a journal tracking thoughts and decisions. This stateful design allows for more consistent and strategic play.
 - **Prompt Templates**: Prompts used by the LLMs are stored in `/prompts/`. You can edit these to customize how models are instructed for both orders and conversations.  
 - **Experimental & WIP**: Ongoing development includes adding strategic goals for each power, more flexible conversation lengths, and a readiness check to advance the phase if all powers are done negotiating.

@ -17,21 +18,28 @@ This repository is an extension of the original [Diplomacy](https://github.com/d
   - Manages conversation rounds (currently up to 3 by default) and calls `get_conversation_reply()` for each power.  
   - After negotiations, each power's orders are gathered concurrently (via threads), using `get_orders()` from the respective LLM client.  
   - Calls `game.process()` to move to the next phase, optionally collecting phase summaries along the way.
+   - Updates agent state after each phase to maintain continuity and strategic direction.

 2. **`lm_service_versus.py`**  
   - Defines a base class (`BaseModelClient`) for hitting any LLM endpoint.  
   - Subclasses (`OpenAIClient`, `ClaudeClient`, etc.) implement `generate_response()` and `get_conversation_reply()` with the specifics of each LLM's API.  
   - Handles prompt construction for orders and conversation, JSON extraction to parse moves or messages, and fallback logic for invalid LLM responses.  

-3. **Modifications in `game.py` (Engine)**  
+3. **`agent.py`**
+   - Implements the DiplomacyAgent class that maintains state for each power.
+   - Tracks goals, relationships with other powers, and a private journal of thoughts.
+   - Provides robust JSON parsing for LLM responses with case-insensitive validation.
+   - Updates goals and relationships based on game events to maintain coherent strategies.
+
+4. **Modifications in `game.py` (Engine)**  
   - Added a `_generate_phase_summary()` method and `phase_summaries` dict to store short textual recaps of each phase.  
   - Summaries can be viewed or repurposed for real-time commentary or as additional context fed back into the LLM.  

 ### Future Explorations

 - **Longer Conversation Phases**: Support for more than 3 message rounds, or an adaptive approach that ends negotiation early if all powers signal "ready."  
- **Strategic Goals**: Let each power maintain high-level goals (e.g., "ally with France," "defend Munich") that the LLM takes into account for orders and conversations.  
- **Enhanced Summaries**: Summaries could incorporate conversation logs or trending alliances, giving the LLM even richer context each turn.  
+- **Enhanced Agent Memory**: Further develop agent memory and learning from past interactions to influence future decisions.
+- **Strategic Map Analysis**: Leverage the map graph structure to help agents make better tactical decisions.
 - **Live Front-End Integration**: Display phase summaries, conversation logs, and highlights of completed orders in a real-time UI. (an attempt to display phase summaries currently in progress)

 ---
--- a/ai_animation/CLAUDE.md
+++ b/ai_animation/CLAUDE.md
@ -29,6 +29,35 @@
   - When animations complete, show phase summary (if available)
   - Advance to next phase and repeat

+## Agent State Display
+The game now includes agent state data that can be visualized:
+
+1. **Goals and Relationships**: Each power has strategic goals and relationships with other powers
+2. **Journal Entries**: Internal thoughts that help explain decision making
+
+### JSON Format Expectations:
+- Agent state is stored in the game JSON with the following structure:
+  ```json
+  {
+    "powers": {
+      "FRANCE": {
+        "goals": ["Secure Belgium", "Form alliance with Italy"],
+        "relationships": {
+          "GERMANY": "Enemy",
+          "ITALY": "Ally",
+          "ENGLAND": "Neutral",
+          "AUSTRIA": "Neutral",
+          "RUSSIA": "Unfriendly",
+          "TURKEY": "Neutral"
+        },
+        "journal": ["Suspicious of England's fleet movements"]
+      }
+    }
+  }
+  ```
+- Relationship status must be one of: "Enemy", "Unfriendly", "Neutral", "Friendly", "Ally"
+- The code handles case variations but the display should normalize to title case
+
 ## Known Issues
 - Text-to-speech requires an ElevenLabs API key in `.env` file
 - Unit animations sometimes don't fire properly after messages
--- a/ai_diplomacy/llms.txt
+++ b/ai_diplomacy/llms.txt
@ -33,11 +33,9 @@ This document provides an analysis of key Python modules within the `ai_diplomac
  * Unsuccessful moves by power with failure reasons
  * Optional sections for other move types

-### PARTIALLY IMPLEMENTED MODULES:
-
-#### 1.4. `agent.py` (PARTIAL)
+#### 1.4. `agent.py` (COMPLETE)
 **Goal:** To maintain stateful agent representation with personality, goals, and relationships.
-**Status:** Base class implemented but not fully integrated into planning/negotiation workflows.
+**Status:** Fully implemented and integrated with planning/negotiation workflows.

 **Key Components:**
 * `DiplomacyAgent` class with:
@ -46,26 +44,40 @@ This document provides an analysis of key Python modules within the `ai_diplomac
  * `goals`: List of strategic goals
  * `relationships`: Dict of relationships with other powers
  * `private_journal`: List of internal thoughts/reflections
+  * `_extract_json_from_text`: Robust JSON extraction from LLM responses
+  * `initialize_agent_state`: Sets initial goals and relationships
+  * `analyze_phase_and_update_state`: Updates goals and relationships based on game events
  * Methods for plan generation, updating goals, and updating relationships

-**Integration Points Needed:**
-* Connect agent state to context generation in `clients.py`
-* Define how personality affects planning and negotiations
-* Remove redundant order generation logic
+**Integration Points:**
+* Connected to context generation in `clients.py`
+* Influences planning and negotiations through goals and relationships
+* Case-insensitive validation of LLM-provided power names and relationship statuses
+* Robust error recovery with fallback defaults when LLM responses fail to parse

-#### 1.5. `negotiations.py` (PARTIAL)
+#### 1.5. `negotiations.py` (COMPLETE)
 **Goal:** To orchestrate the communication phase among active AI powers.
-**Status:** Works but needs integration with DiplomacyAgent state.
+**Status:** Fully implemented and integrated with DiplomacyAgent state.

-#### 1.6. `planning.py` (PARTIAL)
+#### 1.6. `planning.py` (COMPLETE)
 **Goal:** To allow each AI power to generate a high-level strategic directive or plan.
-**Status:** Works but needs integration with DiplomacyAgent state and map analysis.
+**Status:** Fully implemented and integrated with DiplomacyAgent state.

 #### 1.7. `utils.py` (COMPLETE)
 **Goal:** To provide common utility functions used across other AI diplomacy modules.
 **Status:** Fully implemented.

-#### 1.8. `clients.py` (COMPLETE BUT NEEDS EXTENSION)
+#### 1.8. `clients.py` (COMPLETE)
+**Goal:** To abstract and manage interactions with various LLM APIs.
+**Status:** Fully implemented with agent state integration.
+
+### PARTIALLY IMPLEMENTED MODULES:
+
+#### 1.9. `utils.py` (COMPLETE)
+**Goal:** To provide common utility functions used across other AI diplomacy modules.
+**Status:** Fully implemented.
+
+#### 1.10. `clients.py` (COMPLETE BUT NEEDS EXTENSION)
 **Goal:** To abstract and manage interactions with various LLM APIs.
 **Status:** Works, but needs extension to incorporate agent state into context.

@ -73,41 +85,44 @@ This document provides an analysis of key Python modules within the `ai_diplomac

 ## 2. Integration Points

-The following connections need to be established:
+The following connections have been established:

 1. **Agent State → Context Building**
-   * `BaseModelClient.build_context_prompt` needs to incorporate agent's personality, goals, and relationships
-   * Modify `context_prompt.txt` to include sections for agent state
+   * `BaseModelClient.build_context_prompt` incorporates agent's personality, goals, and relationships
+   * Modified prompt templates include sections for agent state

-2. **Map Analysis → Planning**
-   * Use `DiplomacyGraph` and BFS search in `planning_phase` to identify strategic options
-   * Incorporate territory accessibility analysis into strategic planning
+2. **Agent State → Negotiations**
+   * Agent's personality, goals, and relationships influence message generation
+   * Relationships are updated based on negotiation context and results

-3. **Agent State → Negotiations**
-   * Have agent's personality, goals, and relationships influence message generation
-   * Update relationships based on negotiation context and results
+3. **Robust LLM Interaction**
+   * Implemented multi-strategy JSON extraction to handle various LLM response formats
+   * Added case-insensitive validation for power names and relationship statuses
+   * Created fallback mechanisms for all LLM interactions

-4. **Map Analysis → Order Generation**
-   * Incorporate BFS search to help identify optimal movements and support actions
-   * Analyze territory adjacency for attack planning
+4. **Error Recovery**
+   * Added defensive programming throughout agent state updates
+   * Implemented progressive fallback strategies for parsing LLM outputs
+   * Used intelligent defaults to maintain consistent agent state

 ---

-## 3. Next Steps (Implementation Plan)
+## 3. Future Work

-1. **Phase 1: Agent Integration**
-   * Enhance `BaseModelClient.build_context_prompt` to include agent state
-   * Update prompt templates to utilize agent information
-   * Add a regularly updating 'relationships' section to the context prompt
-   * Add a regularly updating 'goals' section to the context prompt
-   * Add a regularly updating 'journal' section to the context prompt
-
-2. **Phase 2: Map Analysis Integration**
+1. **Map Analysis Integration**
   * Create utility functions to leverage BFS search for common strategic questions
   * Integrate these into planning phase
   * Add territory analysis to order generation context

-3. **Phase 3: Test!
+2. **Enhanced Agent Adaptation**
+   * Develop more sophisticated goal updating strategies based on game events
+   * Implement memory of betrayals/alliances across multiple phases
+   * Create feedback loops between relationship states and planning priorities
+
+3. **UI Integration**
+   * Expose agent states (goals, relationships) in the game visualization
+   * Show evolving relationships between powers graphically
+   * Integrate agent journal entries as commentary

 ---

@ -131,6 +146,7 @@ The following connections need to be established:
 ```

 **Current Integration Status:**
-* `agent.py` and `map_utils.py` are implemented but not fully integrated with other modules
-* `phase_summary_callback` works in `lm_game.py` but is not integrated with agent state
-* Base message passing and planning works, but doesn't leverage agent personality or map analysis
+* `agent.py` is fully implemented and integrated with other modules
+* State updates work reliably between phases
+* Robust JSON parsing and case-insensitive validation ensure smooth operation
+* `map_utils.py` is implemented but not yet fully leveraged for strategic planning
--- a/experiment_log.md
+++ b/experiment_log.md
@ -42,6 +42,7 @@
 | 5 | `TypeError` in `add_journal_entry` (wrong args); `JSONDecodeError` (LLM added extra text/markdown fences) | Fix args; Robust JSON parse | Partial Success*  | -$100       |
 | 6 | `TypeError: wrong number of args` for state update call.    | Helper fn; Sync loop; Fix | Failure        | -$100      |
 | 7 | `AttributeError: 'Game' has no attribute 'get_board_state_str'/'current_year'` and JSON key mismatch | Create board_state_str from board_state; Extract year from phase name; Fix JSON key mismatches | Partial Success** | -$100 |
+| 8 | Case-sensitivity issues - power names in relationships not matching ALL_POWERS | Made relationship validation case-insensitive; Reduced log verbosity | Success | +$500 |

 *Partial Success: Game ran 1 year, but failed during state update phase.
 **Partial Success: Game runs without crashing, but LLM responses still don't match expected JSON format.
@ -59,7 +60,44 @@

 **Observation:** Game now runs without crashing through basic state updates, but LLM responses don't use the expected JSON keys (they use "relationships"/"goals" while code expects "updated_relationships"/"updated_goals").

-**Next Steps:** Fix the JSON key mismatch by either:
-1. Updating the state_update_prompt.txt to use "updated_goals" and "updated_relationships", or
-2. Modifying the agent.py code to look for "goals" and "relationships" keys and map them to the expected variables.
+## Experiment 8: Case-Insensitivity Fix 
+
+**Date:** 2025-04-08
+**Goal:** Fix case-sensitivity issues in relationship validation and key name mismatches.
+**Changes:**
+1. Added case-insensitive validation for power names (e.g., "Austria" → "AUSTRIA")
+2. Added case-insensitive validation for relationship statuses (e.g., "enemy" → "Enemy")
+3. Made the code look for alternative JSON key names ("goals"/"relationships" vs "updated_goals"/"updated_relationships")
+4. Reduced log noise by only showing first few validation warnings and a summary count for the rest
+5. Added fallback defaults in all error cases to ensure agent state is never empty
+
+**Observation:** Game now runs successfully through multiple phases. The agent state is properly updated and maintained between phases. Logs are cleaner and more informative.
+
+**Result:** Success (+$500, successfully running through all phases)
+
+---
+
+## Key Learnings & Best Practices
+
+1. **Strong Defensive Programming**
+   - Always implement fallback values when parsing LLM outputs
+   - Use robust JSON extraction with multiple strategies (regex patterns, string cleaning)
+   - Never assume case-sensitivity in LLM outputs - normalize all strings
+
+2. **Adaptable Input Parsing**
+   - Accept multiple key names for the same concept ("goals" vs "updated_goals") 
+   - Adopt flexible parsing approaches that can handle structural variations
+   - Have clear default behaviors defined when expected data is missing
+
+3. **Effective Logging**
+   - Use debug logs liberally during development phases
+   - Keep production logs high-signal and low-noise by limiting repeat warnings
+   - Include contextual information in logs (power name, phase name) for easier debugging
+
+4. **Robust Error Recovery**
+   - Implement progressive fallback strategies: try parsing → try alternate formats → use defaults
+   - Maintain coherent state even when errors occur - never leave agent in partial/invalid state
+   - When unexpected errors occur, recover gracefully rather than crashing
+
+These learnings have significantly improved the Agent architecture's reliability and are applicable to other LLM-integration contexts.