7 KiB
AI Diplomacy Enhancement - Experiment Log
Goal: Integrate improvements for game state tracking, order validation, strategic map analysis, agent state, planning, and negotiation into the AI Diplomacy codebase while maintaining high quality and avoiding downtime.
Changes Summary (Tasks Completed):
- Task 1: Enhanced Game History Tracking (Phase/Experience)
- Task 2: Improved Order Validation/Processing (Normalization)
- Task 3: Strategic Map Analysis (Graph/BFS)
- Task 4: Upgraded Agent Architecture (Stateful Agent Class)
- Task 5: Enhanced Negotiation Protocol (Agent State Integration)
- Task 7: Enhanced Prompt Structure (System Prompts)
- Task 9: Implemented Planning Module
- Task 10: Improved Phase Summaries and Display
Key Implementation Details:
- Agent State:
ai_diplomacy/agent.py(DiplomacyAgent class stores personality, goals, relationships, journal). System prompts loaded fromai_diplomacy/prompts/system_prompts/. - Planning:
ai_diplomacy/planning.py(planning_phase uses Agent),ai_diplomacy/agent.py(generate_plan),ai_diplomacy/clients.py(get_plan),ai_diplomacy/prompts/planning_instructions.txt. - Negotiation:
ai_diplomacy/negotiations.py(conduct_negotiations uses Agent state),ai_diplomacy/clients.py(get_conversation_reply accepts Agent state),ai_diplomacy/prompts/conversation_instructions.txt,ai_diplomacy/prompts/context_prompt.txt. - Game History:
ai_diplomacy/game_history.py(stores plans, messages, etc.) - Utilities:
ai_diplomacy/utils.py(order normalization),ai_diplomacy/map_utils.py(graph analysis) - Phase Summaries:
lm_game.py(phase_summary_callback), modified Game class to properly record and export summaries.
Experiment 4: Initial State & Update Loop Debug
Date: 2025-04-07 Goal: Fix initial goal generation failure and ensure state update loop runs. Changes:
- Added default neutral relationships in
Agent.__init__. - Added
Agent.initialize_agent_stateusing LLM (called fromlm_game). - Added error handling/logging to
Agent.analyze_phase_and_update_state. Observation: Initial goals stillNone specifieddue toTypeErrorinbuild_context_promptcall withininitialize_agent_state. Relationships defaulted correctly. State update loop (analyze_phase_and_update_state) was not being called inlm_game.py. Result: Failure (-$0.00, minimal LLM calls due to error) Next Steps: Add debug logs toinitialize_agent_statecall; Implement the state update loop call inlm_game.pyaftergame.process().
Debugging Table, -$100 on failure, +$500 on success
| # | Problem | Attempted Solution | Outcome | Balance ($) |
|---|---|---|---|---|
| 4 | Initial goals TypeError in build_context_prompt; State update loop not called. |
Debug logs; Implement loop | Failure | -$100 |
| 5 | TypeError in add_journal_entry (wrong args); JSONDecodeError (LLM added extra text/markdown fences) |
Fix args; Robust JSON parse | Partial Success* | -$100 |
| 6 | TypeError: wrong number of args for state update call. |
Helper fn; Sync loop; Fix | Failure | -$100 |
| 7 | AttributeError: 'Game' has no attribute 'get_board_state_str'/'current_year' and JSON key mismatch |
Create board_state_str from board_state; Extract year from phase name; Fix JSON key mismatches | Partial Success** | -$100 |
| 8 | Case-sensitivity issues - power names in relationships not matching ALL_POWERS | Made relationship validation case-insensitive; Reduced log verbosity | Success | +$500 |
*Partial Success: Game ran 1 year, but failed during state update phase. **Partial Success: Game runs without crashing, but LLM responses still don't match expected JSON format.
Experiment 7: Game State Processing Fixes
Date: 2025-04-08 Goal: Fix the game state processing and JSON format issues. Changes:
- Fixed parameter mismatch in
analyze_phase_and_update_state: Changed from (game, game_history) to (game, board_state, phase_summary, game_history) - Made JSON parsing more robust with a dedicated
_extract_json_from_texthelper method - Added fallback values in case of JSON parsing failures
- Fixed missing game attributes: created board_state_str from board_state dict, extracted year from phase name
- Identified JSON key mismatch between prompt ("relationships"/"goals") and code ("updated_relationships"/"updated_goals")
Observation: Game now runs without crashing through basic state updates, but LLM responses don't use the expected JSON keys (they use "relationships"/"goals" while code expects "updated_relationships"/"updated_goals").
Experiment 8: Case-Insensitivity Fix
Date: 2025-04-08 Goal: Fix case-sensitivity issues in relationship validation and key name mismatches. Changes:
- Added case-insensitive validation for power names (e.g., "Austria" → "AUSTRIA")
- Added case-insensitive validation for relationship statuses (e.g., "enemy" → "Enemy")
- Made the code look for alternative JSON key names ("goals"/"relationships" vs "updated_goals"/"updated_relationships")
- Reduced log noise by only showing first few validation warnings and a summary count for the rest
- Added fallback defaults in all error cases to ensure agent state is never empty
Observation: Game now runs successfully through multiple phases. The agent state is properly updated and maintained between phases. Logs are cleaner and more informative.
Result: Success (+$500, successfully running through all phases)
Key Learnings & Best Practices
-
Strong Defensive Programming
- Always implement fallback values when parsing LLM outputs
- Use robust JSON extraction with multiple strategies (regex patterns, string cleaning)
- Never assume case-sensitivity in LLM outputs - normalize all strings
-
Adaptable Input Parsing
- Accept multiple key names for the same concept ("goals" vs "updated_goals")
- Adopt flexible parsing approaches that can handle structural variations
- Have clear default behaviors defined when expected data is missing
-
Effective Logging
- Use debug logs liberally during development phases
- Keep production logs high-signal and low-noise by limiting repeat warnings
- Include contextual information in logs (power name, phase name) for easier debugging
-
Robust Error Recovery
- Implement progressive fallback strategies: try parsing → try alternate formats → use defaults
- Maintain coherent state even when errors occur - never leave agent in partial/invalid state
- When unexpected errors occur, recover gracefully rather than crashing
These learnings have significantly improved the Agent architecture's reliability and are applicable to other LLM-integration contexts.