Resolved conflicts — combined best of both:
- bot_runner.py: kept our trust-aware candidate building + upstream's tier-avg rates + no task cap
- task_complete.py: upstream's additive skill boost (nerfs greedy snowball) + our configurable cap (wc.skill_rate_max instead of hardcoded 10)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Hide exact reward_multiplier from agent; show tier (Standard/Premium/Enterprise) and specialty domains instead
- Add client domain specialization with 70% bias on task generation toward client specialties
- Remove qty_scale by multiplier (leaked info and doubly punished high-mult clients)
- Rewrite agent prompt to describe tiers/specialties without exact formulas
- Fix critical loop.py bug: provide full state context after sim resume (prevents idle multi-month skips)
- Add Streamlit dashboard, watch scripts, and updated plotting/extraction
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Updated both plot_comparison.py and plot_prestige_radar.py to show only
the greedy bot baseline renamed as "Human Devised Rule". Regenerated
both plots.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bug fixes:
- CLI --horizon-years defaulted to 3, silently overriding config presets.
Now defaults to None so config value (1yr for medium/hard/nightmare) is used.
- Runtime passed a single api_key kwarg regardless of provider, breaking
Gemini. Now lets LiteLLM resolve keys from provider-specific env vars.
- Removed temperature+top_p from LLM calls (Anthropic rejects both together).
- DB and result filenames now include config name to prevent cross-config collisions.
Benchmark results (1yr horizon, 3 seeds each):
Sonnet 4.6: medium 2/3, hard 0/3, nightmare 1/3
Gemini Flash: medium 3/3, hard 1/3, nightmare 1/3
Gemini has higher win rates (93-98% vs 40-83% on medium).
Sonnet's ceiling is higher when it survives (nightmare $10.1M vs $478K).
New scripts: plot_comparison.py, plot_sonnet_results.py, notepad_gif.py
Updated README with detailed comparison tables and failure analysis.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>