yc-bench

mirror of https://github.com/collinear-ai/yc-bench.git synced 2026-04-19 12:58:03 +00:00

History

alckasoc d28ccb1bb2 Merge upstream/main: greedy baseline fix + additive skill boost Resolved conflicts — combined best of both: - bot_runner.py: kept our trust-aware candidate building + upstream's tier-avg rates + no task cap - task_complete.py: upstream's additive skill boost (nerfs greedy snowball) + our configurable cap (wc.skill_rate_max instead of hardcoded 10) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>		2026-03-09 17:39:58 -07:00
..
bot_runner.py	Merge upstream/main: greedy baseline fix + additive skill boost	2026-03-09 17:39:58 -07:00
notepad_gif.py	Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results	2026-02-26 00:31:00 -08:00
plot_comparison.py	Rename Greedy Bot to Human Devised Rule, remove other bot baselines from plots	2026-02-27 14:03:04 -08:00
plot_multi_model.py	Fixed task difficulty with base reward & deadline change	2026-03-06 18:08:11 -08:00
plot_prestige_radar.py	Updated backend to calculate employee tier with spiky skill distribution; simplified domain count to 4	2026-03-05 18:12:48 -08:00
plot_results.py	Add multi-strategy client trust system with tiers, specialties, and idle-turn fix	2026-03-09 17:37:49 -07:00
plot_run.py	Updated backend to calculate employee tier with spiky skill distribution; simplified domain count to 4	2026-03-05 18:12:48 -08:00
plot_single_run.py	init	2026-03-08 17:40:10 -07:00
plot_sonnet_results.py	Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results	2026-02-26 00:31:00 -08:00
run_benchmark.sh	Calibrated domain prestge bump	2026-03-06 14:40:45 -08:00
watch_dashboard.py	Add multi-strategy client trust system with tiers, specialties, and idle-turn fix	2026-03-09 17:37:49 -07:00
watch_run.py	Add multi-strategy client trust system with tiers, specialties, and idle-turn fix	2026-03-09 17:37:49 -07:00