Commit graph

70 commits

Author SHA1 Message Date
Adit Jain
14b0179f41 Fix formatting issues in README.md 2026-02-26 22:13:32 -08:00
adit jain
86b0741c41 Fix start.sh: re-download and re-exec when piped via curl
When run as `curl ... | bash`, stdin is the pipe so Rich prompts
abort immediately. Now detects non-tty stdin, re-downloads the script
to a temp file, and exec's it — stdin becomes the terminal again.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 21:24:56 -08:00
adit jain
56ad582226 Fix start.sh: reattach stdin to /dev/tty for curl pipe usage
When run via `curl ... | bash`, stdin is the pipe not the terminal,
causing interactive prompts to abort immediately. Adding </dev/tty
restores terminal input.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 21:22:48 -08:00
adit jain
649c42207a Add db/ source files that were blocked by overly broad gitignore
The old `db/` pattern in .gitignore matched src/yc_bench/db/ too,
preventing all ORM models and session.py from being committed.
Previous commit fixed .gitignore to `/db/`; this adds the 10 missing files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 21:20:07 -08:00
adit jain
db7d9f218a Add db/ source files that were blocked by overly broad gitignore
The old `db/` pattern in .gitignore matched src/yc_bench/db/ too,
preventing all ORM models and session.py from being committed.
Previous commit fixed .gitignore to `/db/`; this adds the 10 missing files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 21:19:45 -08:00
Adit Jain
6df0f79055
Merge pull request #3 from collinear-ai/fresh-main
Fix fresh install: add missing __init__.py and fix .gitignore
2026-02-26 21:17:54 -08:00
adit jain
a11b2828a9 Fix fresh install: add missing __init__.py and fix .gitignore
Fresh clones failed with ModuleNotFoundError because agent/, db/,
runner/, and services/ subpackages had no __init__.py. Also anchor
/db/ and /logs/ in .gitignore so they don't match src/yc_bench/db/.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 21:15:17 -08:00
Adit Jain
5ccd14c02f
Merge pull request #2 from collinear-ai/fresh-main
Added a start script and bots!
2026-02-26 21:13:36 -08:00
adit jain
5f31969865 Add Collinear branding, bot runners, and clean up stale plots
- Restyle plot_comparison.py with Collinear brand palette and logo
- Add collinear_logo.svg and collinear_wordmark.svg
- Add bot_runner.py (greedy/random/throughput/prestige strategies)
- Add greedy_bot.py shim
- Remove old unused plots (funds_curves, notepad gifs, sonnet_results)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 21:12:05 -08:00
adit jain
75a53de25c Add interactive quickstart: yc-bench start and one-line start.sh
3-step interactive flow: pick difficulty (with custom preset builder),
choose model from curated list (Claude, GPT, Gemini, DeepSeek, etc.),
enter API key (auto-detected by prefix). Single curl command to get started:
curl -sSL https://raw.githubusercontent.com/.../start.sh | bash

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 21:10:56 -08:00
adit jain
3643806dce Added the configs and updated the results. 2026-02-26 13:37:58 -08:00
AnandK27
5c39e448de readme fixes 2026-02-26 01:02:13 -08:00
Adit Jain
07f159830d
Merge pull request #1 from collinear-ai/fresh-main
Fresh main
2026-02-26 00:57:39 -08:00
adit jain
5d2962073d Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results
Bug fixes:
- CLI --horizon-years defaulted to 3, silently overriding config presets.
  Now defaults to None so config value (1yr for medium/hard/nightmare) is used.
- Runtime passed a single api_key kwarg regardless of provider, breaking
  Gemini. Now lets LiteLLM resolve keys from provider-specific env vars.
- Removed temperature+top_p from LLM calls (Anthropic rejects both together).
- DB and result filenames now include config name to prevent cross-config collisions.

Benchmark results (1yr horizon, 3 seeds each):
  Sonnet 4.6: medium 2/3, hard 0/3, nightmare 1/3
  Gemini Flash: medium 3/3, hard 1/3, nightmare 1/3
  Gemini has higher win rates (93-98% vs 40-83% on medium).
  Sonnet's ceiling is higher when it survives (nightmare $10.1M vs $478K).

New scripts: plot_comparison.py, plot_sonnet_results.py, notepad_gif.py
Updated README with detailed comparison tables and failure analysis.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 00:31:00 -08:00
adit jain
d1d7bc97b5 Add 5-level difficulty gradient: tutorial → easy → medium → hard → nightmare
Each config is 1-year, no turn limit, testing progressively deeper
understanding of the simulation dynamics:

- tutorial: basic loop (accept→assign→dispatch→resume)
- easy: throughput awareness (rate/N dilution kills parallelism)
- medium: prestige strategy (must specialise 2-3 domains to unlock market)
- hard: ETA computation (one bad accept degrades in-flight tasks)
- nightmare: sustained perfection (5.4mo runway, must reach prestige 5 or die)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-25 19:33:55 -08:00
Adit Jain
08f081d322
Update README with citation for YC-Bench
Added citation information for the YC-Bench project.
2026-02-26 00:26:03 +05:30
Adit Jain
78c86b35e0
Fix formatting in README for discrete-event simulation 2026-02-25 23:39:29 +05:30
Adit Jain
7ad96dee8f
Fix formatting issues in README.md 2026-02-25 23:04:38 +05:30
adit jain
3f51641bf5 Add CLAUDE.md to gitignore 2026-02-25 02:28:48 -08:00
adit jain
3a1c562827 Initial commit 2026-02-25 02:16:35 -08:00