yc-bench

mirror of https://github.com/collinear-ai/yc-bench.git synced 2026-04-23 16:55:00 +00:00

Author	SHA1	Message	Date
Adit Jain	5ccd14c02f	Merge pull request #2 from collinear-ai/fresh-main Added a start script and bots!	2026-02-26 21:13:36 -08:00
adit jain	3643806dce	Added the configs and updated the results.	2026-02-26 13:37:58 -08:00
AnandK27	5c39e448de	readme fixes	2026-02-26 01:02:13 -08:00
Adit Jain	07f159830d	Merge pull request #1 from collinear-ai/fresh-main Fresh main	2026-02-26 00:57:39 -08:00
adit jain	5d2962073d	Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results Bug fixes: - CLI --horizon-years defaulted to 3, silently overriding config presets. Now defaults to None so config value (1yr for medium/hard/nightmare) is used. - Runtime passed a single api_key kwarg regardless of provider, breaking Gemini. Now lets LiteLLM resolve keys from provider-specific env vars. - Removed temperature+top_p from LLM calls (Anthropic rejects both together). - DB and result filenames now include config name to prevent cross-config collisions. Benchmark results (1yr horizon, 3 seeds each): Sonnet 4.6: medium 2/3, hard 0/3, nightmare 1/3 Gemini Flash: medium 3/3, hard 1/3, nightmare 1/3 Gemini has higher win rates (93-98% vs 40-83% on medium). Sonnet's ceiling is higher when it survives (nightmare $10.1M vs $478K). New scripts: plot_comparison.py, plot_sonnet_results.py, notepad_gif.py Updated README with detailed comparison tables and failure analysis. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 00:31:00 -08:00
Adit Jain	08f081d322	Update README with citation for YC-Bench Added citation information for the YC-Bench project.	2026-02-26 00:26:03 +05:30
Adit Jain	78c86b35e0	Fix formatting in README for discrete-event simulation	2026-02-25 23:39:29 +05:30
Adit Jain	7ad96dee8f	Fix formatting issues in README.md	2026-02-25 23:04:38 +05:30
adit jain	3a1c562827	Initial commit	2026-02-25 02:16:35 -08:00

9 commits