iterating

This commit is contained in:
AlxAI 2025-04-13 12:12:11 -07:00
parent 6e5079fa02
commit 65f287df84
4 changed files with 135 additions and 44 deletions

View file

@ -7,6 +7,7 @@ This repository is an extension of the original [Diplomacy](https://github.com/d
- **Conversation & Negotiation**: Powers can have multi-turn negotiations with each other via `lm_game.py`. They can exchange private or global messages, allowing for more interactive diplomacy.
- **Order Generation**: Each power can choose its orders (moves, holds, supports, etc.) using LLMs via `lm_service_versus.py`. Currently supports OpenAI, Claude, Gemini, DeepSeek
- **Phase Summaries**: Modifications in the `game.py` engine allow the generation of "phase summaries," providing a succinct recap of each turn's events. This could help both human spectators and the LLMs themselves to understand the game state more easily.
- **Agent State Architecture**: Powers are represented by DiplomacyAgent instances that maintain goals, relationships, and a journal tracking thoughts and decisions. This stateful design allows for more consistent and strategic play.
- **Prompt Templates**: Prompts used by the LLMs are stored in `/prompts/`. You can edit these to customize how models are instructed for both orders and conversations.
- **Experimental & WIP**: Ongoing development includes adding strategic goals for each power, more flexible conversation lengths, and a readiness check to advance the phase if all powers are done negotiating.
@ -17,21 +18,28 @@ This repository is an extension of the original [Diplomacy](https://github.com/d
- Manages conversation rounds (currently up to 3 by default) and calls `get_conversation_reply()` for each power.
- After negotiations, each power's orders are gathered concurrently (via threads), using `get_orders()` from the respective LLM client.
- Calls `game.process()` to move to the next phase, optionally collecting phase summaries along the way.
- Updates agent state after each phase to maintain continuity and strategic direction.
2. **`lm_service_versus.py`**
- Defines a base class (`BaseModelClient`) for hitting any LLM endpoint.
- Subclasses (`OpenAIClient`, `ClaudeClient`, etc.) implement `generate_response()` and `get_conversation_reply()` with the specifics of each LLM's API.
- Handles prompt construction for orders and conversation, JSON extraction to parse moves or messages, and fallback logic for invalid LLM responses.
3. **Modifications in `game.py` (Engine)**
3. **`agent.py`**
- Implements the DiplomacyAgent class that maintains state for each power.
- Tracks goals, relationships with other powers, and a private journal of thoughts.
- Provides robust JSON parsing for LLM responses with case-insensitive validation.
- Updates goals and relationships based on game events to maintain coherent strategies.
4. **Modifications in `game.py` (Engine)**
- Added a `_generate_phase_summary()` method and `phase_summaries` dict to store short textual recaps of each phase.
- Summaries can be viewed or repurposed for real-time commentary or as additional context fed back into the LLM.
### Future Explorations
- **Longer Conversation Phases**: Support for more than 3 message rounds, or an adaptive approach that ends negotiation early if all powers signal "ready."
- **Strategic Goals**: Let each power maintain high-level goals (e.g., "ally with France," "defend Munich") that the LLM takes into account for orders and conversations.
- **Enhanced Summaries**: Summaries could incorporate conversation logs or trending alliances, giving the LLM even richer context each turn.
- **Enhanced Agent Memory**: Further develop agent memory and learning from past interactions to influence future decisions.
- **Strategic Map Analysis**: Leverage the map graph structure to help agents make better tactical decisions.
- **Live Front-End Integration**: Display phase summaries, conversation logs, and highlights of completed orders in a real-time UI. (an attempt to display phase summaries currently in progress)
---

View file

@ -29,6 +29,35 @@
- When animations complete, show phase summary (if available)
- Advance to next phase and repeat
## Agent State Display
The game now includes agent state data that can be visualized:
1. **Goals and Relationships**: Each power has strategic goals and relationships with other powers
2. **Journal Entries**: Internal thoughts that help explain decision making
### JSON Format Expectations:
- Agent state is stored in the game JSON with the following structure:
```json
{
"powers": {
"FRANCE": {
"goals": ["Secure Belgium", "Form alliance with Italy"],
"relationships": {
"GERMANY": "Enemy",
"ITALY": "Ally",
"ENGLAND": "Neutral",
"AUSTRIA": "Neutral",
"RUSSIA": "Unfriendly",
"TURKEY": "Neutral"
},
"journal": ["Suspicious of England's fleet movements"]
}
}
}
```
- Relationship status must be one of: "Enemy", "Unfriendly", "Neutral", "Friendly", "Ally"
- The code handles case variations but the display should normalize to title case
## Known Issues
- Text-to-speech requires an ElevenLabs API key in `.env` file
- Unit animations sometimes don't fire properly after messages

View file

@ -33,11 +33,9 @@ This document provides an analysis of key Python modules within the `ai_diplomac
* Unsuccessful moves by power with failure reasons
* Optional sections for other move types
### PARTIALLY IMPLEMENTED MODULES:
#### 1.4. `agent.py` (PARTIAL)
#### 1.4. `agent.py` (COMPLETE)
**Goal:** To maintain stateful agent representation with personality, goals, and relationships.
**Status:** Base class implemented but not fully integrated into planning/negotiation workflows.
**Status:** Fully implemented and integrated with planning/negotiation workflows.
**Key Components:**
* `DiplomacyAgent` class with:
@ -46,26 +44,40 @@ This document provides an analysis of key Python modules within the `ai_diplomac
* `goals`: List of strategic goals
* `relationships`: Dict of relationships with other powers
* `private_journal`: List of internal thoughts/reflections
* `_extract_json_from_text`: Robust JSON extraction from LLM responses
* `initialize_agent_state`: Sets initial goals and relationships
* `analyze_phase_and_update_state`: Updates goals and relationships based on game events
* Methods for plan generation, updating goals, and updating relationships
**Integration Points Needed:**
* Connect agent state to context generation in `clients.py`
* Define how personality affects planning and negotiations
* Remove redundant order generation logic
**Integration Points:**
* Connected to context generation in `clients.py`
* Influences planning and negotiations through goals and relationships
* Case-insensitive validation of LLM-provided power names and relationship statuses
* Robust error recovery with fallback defaults when LLM responses fail to parse
#### 1.5. `negotiations.py` (PARTIAL)
#### 1.5. `negotiations.py` (COMPLETE)
**Goal:** To orchestrate the communication phase among active AI powers.
**Status:** Works but needs integration with DiplomacyAgent state.
**Status:** Fully implemented and integrated with DiplomacyAgent state.
#### 1.6. `planning.py` (PARTIAL)
#### 1.6. `planning.py` (COMPLETE)
**Goal:** To allow each AI power to generate a high-level strategic directive or plan.
**Status:** Works but needs integration with DiplomacyAgent state and map analysis.
**Status:** Fully implemented and integrated with DiplomacyAgent state.
#### 1.7. `utils.py` (COMPLETE)
**Goal:** To provide common utility functions used across other AI diplomacy modules.
**Status:** Fully implemented.
#### 1.8. `clients.py` (COMPLETE BUT NEEDS EXTENSION)
#### 1.8. `clients.py` (COMPLETE)
**Goal:** To abstract and manage interactions with various LLM APIs.
**Status:** Fully implemented with agent state integration.
### PARTIALLY IMPLEMENTED MODULES:
#### 1.9. `utils.py` (COMPLETE)
**Goal:** To provide common utility functions used across other AI diplomacy modules.
**Status:** Fully implemented.
#### 1.10. `clients.py` (COMPLETE BUT NEEDS EXTENSION)
**Goal:** To abstract and manage interactions with various LLM APIs.
**Status:** Works, but needs extension to incorporate agent state into context.
@ -73,41 +85,44 @@ This document provides an analysis of key Python modules within the `ai_diplomac
## 2. Integration Points
The following connections need to be established:
The following connections have been established:
1. **Agent State → Context Building**
* `BaseModelClient.build_context_prompt` needs to incorporate agent's personality, goals, and relationships
* Modify `context_prompt.txt` to include sections for agent state
* `BaseModelClient.build_context_prompt` incorporates agent's personality, goals, and relationships
* Modified prompt templates include sections for agent state
2. **Map Analysis → Planning**
* Use `DiplomacyGraph` and BFS search in `planning_phase` to identify strategic options
* Incorporate territory accessibility analysis into strategic planning
2. **Agent State → Negotiations**
* Agent's personality, goals, and relationships influence message generation
* Relationships are updated based on negotiation context and results
3. **Agent State → Negotiations**
* Have agent's personality, goals, and relationships influence message generation
* Update relationships based on negotiation context and results
3. **Robust LLM Interaction**
* Implemented multi-strategy JSON extraction to handle various LLM response formats
* Added case-insensitive validation for power names and relationship statuses
* Created fallback mechanisms for all LLM interactions
4. **Map Analysis → Order Generation**
* Incorporate BFS search to help identify optimal movements and support actions
* Analyze territory adjacency for attack planning
4. **Error Recovery**
* Added defensive programming throughout agent state updates
* Implemented progressive fallback strategies for parsing LLM outputs
* Used intelligent defaults to maintain consistent agent state
---
## 3. Next Steps (Implementation Plan)
## 3. Future Work
1. **Phase 1: Agent Integration**
* Enhance `BaseModelClient.build_context_prompt` to include agent state
* Update prompt templates to utilize agent information
* Add a regularly updating 'relationships' section to the context prompt
* Add a regularly updating 'goals' section to the context prompt
* Add a regularly updating 'journal' section to the context prompt
2. **Phase 2: Map Analysis Integration**
1. **Map Analysis Integration**
* Create utility functions to leverage BFS search for common strategic questions
* Integrate these into planning phase
* Add territory analysis to order generation context
3. **Phase 3: Test!
2. **Enhanced Agent Adaptation**
* Develop more sophisticated goal updating strategies based on game events
* Implement memory of betrayals/alliances across multiple phases
* Create feedback loops between relationship states and planning priorities
3. **UI Integration**
* Expose agent states (goals, relationships) in the game visualization
* Show evolving relationships between powers graphically
* Integrate agent journal entries as commentary
---
@ -131,6 +146,7 @@ The following connections need to be established:
```
**Current Integration Status:**
* `agent.py` and `map_utils.py` are implemented but not fully integrated with other modules
* `phase_summary_callback` works in `lm_game.py` but is not integrated with agent state
* Base message passing and planning works, but doesn't leverage agent personality or map analysis
* `agent.py` is fully implemented and integrated with other modules
* State updates work reliably between phases
* Robust JSON parsing and case-insensitive validation ensure smooth operation
* `map_utils.py` is implemented but not yet fully leveraged for strategic planning

View file

@ -42,6 +42,7 @@
| 5 | `TypeError` in `add_journal_entry` (wrong args); `JSONDecodeError` (LLM added extra text/markdown fences) | Fix args; Robust JSON parse | Partial Success* | -$100 |
| 6 | `TypeError: wrong number of args` for state update call. | Helper fn; Sync loop; Fix | Failure | -$100 |
| 7 | `AttributeError: 'Game' has no attribute 'get_board_state_str'/'current_year'` and JSON key mismatch | Create board_state_str from board_state; Extract year from phase name; Fix JSON key mismatches | Partial Success** | -$100 |
| 8 | Case-sensitivity issues - power names in relationships not matching ALL_POWERS | Made relationship validation case-insensitive; Reduced log verbosity | Success | +$500 |
*Partial Success: Game ran 1 year, but failed during state update phase.
**Partial Success: Game runs without crashing, but LLM responses still don't match expected JSON format.
@ -59,7 +60,44 @@
**Observation:** Game now runs without crashing through basic state updates, but LLM responses don't use the expected JSON keys (they use "relationships"/"goals" while code expects "updated_relationships"/"updated_goals").
**Next Steps:** Fix the JSON key mismatch by either:
1. Updating the state_update_prompt.txt to use "updated_goals" and "updated_relationships", or
2. Modifying the agent.py code to look for "goals" and "relationships" keys and map them to the expected variables.
## Experiment 8: Case-Insensitivity Fix
**Date:** 2025-04-08
**Goal:** Fix case-sensitivity issues in relationship validation and key name mismatches.
**Changes:**
1. Added case-insensitive validation for power names (e.g., "Austria" → "AUSTRIA")
2. Added case-insensitive validation for relationship statuses (e.g., "enemy" → "Enemy")
3. Made the code look for alternative JSON key names ("goals"/"relationships" vs "updated_goals"/"updated_relationships")
4. Reduced log noise by only showing first few validation warnings and a summary count for the rest
5. Added fallback defaults in all error cases to ensure agent state is never empty
**Observation:** Game now runs successfully through multiple phases. The agent state is properly updated and maintained between phases. Logs are cleaner and more informative.
**Result:** Success (+$500, successfully running through all phases)
---
## Key Learnings & Best Practices
1. **Strong Defensive Programming**
- Always implement fallback values when parsing LLM outputs
- Use robust JSON extraction with multiple strategies (regex patterns, string cleaning)
- Never assume case-sensitivity in LLM outputs - normalize all strings
2. **Adaptable Input Parsing**
- Accept multiple key names for the same concept ("goals" vs "updated_goals")
- Adopt flexible parsing approaches that can handle structural variations
- Have clear default behaviors defined when expected data is missing
3. **Effective Logging**
- Use debug logs liberally during development phases
- Keep production logs high-signal and low-noise by limiting repeat warnings
- Include contextual information in logs (power name, phase name) for easier debugging
4. **Robust Error Recovery**
- Implement progressive fallback strategies: try parsing → try alternate formats → use defaults
- Maintain coherent state even when errors occur - never leave agent in partial/invalid state
- When unexpected errors occur, recover gracefully rather than crashing
These learnings have significantly improved the Agent architecture's reliability and are applicable to other LLM-integration contexts.