yc-bench/scripts
2026-03-23 19:19:38 -07:00
..
bot_runner.py Merge branch 'main' into results/main 2026-03-23 19:19:38 -07:00
notepad_gif.py Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results 2026-02-26 00:31:00 -08:00
plot_comparison.py Rename Greedy Bot to Human Devised Rule, remove other bot baselines from plots 2026-02-27 14:03:04 -08:00
plot_multi_episode.py Add multi-episode setting with scratchpad carryover between bankruptcies 2026-03-11 19:22:32 -07:00
plot_multi_model.py Fixed task difficulty with base reward & deadline change 2026-03-06 18:08:11 -08:00
plot_prestige_radar.py Updated backend to calculate employee tier with spiky skill distribution; simplified domain count to 4 2026-03-05 18:12:48 -08:00
plot_results.py Add multi-strategy client trust system with tiers, specialties, and idle-turn fix 2026-03-09 17:37:49 -07:00
plot_run.py logging and plotting 2026-03-20 05:19:56 -07:00
plot_single_run.py init 2026-03-08 17:40:10 -07:00
plot_sonnet_results.py Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results 2026-02-26 00:31:00 -08:00
run_benchmark.sh Calibrated domain prestge bump 2026-03-06 14:40:45 -08:00
smart_bot.py Removed browse limit from bot runner 2026-03-19 13:44:31 -07:00
watch_dashboard.py fix seeding 2026-03-20 18:43:19 -07:00
watch_run.py Add multi-strategy client trust system with tiers, specialties, and idle-turn fix 2026-03-09 17:37:49 -07:00