yc-bench/plots
Adit Jain d976b9cbb4
Merge pull request #11 from collinear-ai/feat/multi-episode
Add multi-episode setting with scratchpad carryover
2026-03-13 18:21:37 -07:00
..
collinear_logo.svg Add Collinear branding, bot runners, and clean up stale plots 2026-02-26 21:12:05 -08:00
collinear_wordmark.svg Add Collinear branding, bot runners, and clean up stale plots 2026-02-26 21:12:05 -08:00
hard_1_gpt-5.4_funds.png Updated initial eval on new backend 2026-03-06 18:49:32 -08:00
hard_1_gpt-5.4_prestige.png Updated initial eval on new backend 2026-03-06 18:49:32 -08:00
hard_1_greedy_bot_funds.png Fixed greedy baseline and lowered min val of employee skills 2026-03-09 15:18:49 -07:00
hard_blind_greedy_3seeds.png Fixed greedy baseline and lowered min val of employee skills 2026-03-09 15:18:49 -07:00
hard_greedy_bot_3seeds.png Fixed greedy baseline and lowered min val of employee skills 2026-03-09 15:18:49 -07:00
multi_episode_haiku.png Add multi-episode setting with scratchpad carryover between bankruptcies 2026-03-11 19:22:32 -07:00
notepad_hard_1_claude-sonnet-4-6.gif Added the configs and updated the results. 2026-02-26 13:37:58 -08:00
notepad_hard_1_gemini-3-flash-preview.gif Added the configs and updated the results. 2026-02-26 13:37:58 -08:00
notepad_hard_2_claude-sonnet-4-6.gif Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results 2026-02-26 00:31:00 -08:00
notepad_hard_2_gemini-3-flash-preview.gif Added the configs and updated the results. 2026-02-26 13:37:58 -08:00
notepad_hard_3_claude-sonnet-4-6.gif Added the configs and updated the results. 2026-02-26 13:37:58 -08:00
notepad_medium_3_claude-sonnet-4-6.gif Added the configs and updated the results. 2026-02-26 13:37:58 -08:00
notepad_nightmare_1_claude-sonnet-4-6.gif Added the configs and updated the results. 2026-02-26 13:37:58 -08:00
notepad_nightmare_3_claude-sonnet-4-6.gif Added the configs and updated the results. 2026-02-26 13:37:58 -08:00
prestige_radar.png Rename Greedy Bot to Human Devised Rule, remove other bot baselines from plots 2026-02-27 14:03:04 -08:00
sonnet_vs_gemini.png Rename Greedy Bot to Human Devised Rule, remove other bot baselines from plots 2026-02-27 14:03:04 -08:00