Commit graph

  • 018ae57749
    Merge 3379ca0b7f into bfb0c88062 Dan Austin 2026-04-13 12:47:46 +01:00
  • 3379ca0b7f Fix task cancel: filter pending events by dedupe_key Dan Austin 2026-04-13 12:47:30 +01:00
  • bfb0c88062
    Merge pull request #26 from collinear-ai/vincent/gemma4-31b-results main Vincent Tu 2026-04-04 20:21:23 -07:00
  • a4a8208022 gemma 4 31b results; went bankrupt! alckasoc 2026-04-04 20:20:11 -07:00
  • d253c58782
    Merge pull request #25 from collinear-ai/vincent/website Vincent Tu 2026-04-04 17:59:47 -07:00
  • bce35279cb minor website update! alckasoc 2026-04-04 17:57:46 -07:00
  • e1cd26e36e
    Merge pull request #24 from collinear-ai/vincent/website Vincent Tu 2026-04-04 17:50:45 -07:00
  • f54585df5e update website alckasoc 2026-04-04 17:33:48 -07:00
  • ffd77905ae
    Merge pull request #23 from collinear-ai/nazneenrajani-patch-1 Nazneen Rajani 2026-04-04 16:55:25 -07:00
  • a9e3df8827
    Revise citation for YC-Bench in README nazneenrajani-patch-1 Nazneen Rajani 2026-04-04 16:55:13 -07:00
  • a5cee60c77
    Merge pull request #21 from collinear-ai/vincent/readme Anand Kumar 2026-04-03 20:11:40 -07:00
  • faacc5886c update webpage arxiv link alckasoc 2026-04-03 15:39:16 -07:00
  • 38eaea7d0c update readme; clean up unused files; black formatting alckasoc 2026-04-01 14:44:39 -07:00
  • 97b1bdb2e0
    Merge pull request #20 from collinear-ai/vincent/webpage Vincent Tu 2026-04-01 13:56:52 -07:00
  • 556a35363d update index html alckasoc 2026-04-01 13:56:42 -07:00
  • 6eba7a9854 Add static docs site for GitHub Pages alckasoc 2026-04-01 13:32:58 -07:00
  • 0c53c98f01
    Merge pull request #19 from collinear-ai/results/main RiddleHe 2026-03-23 21:22:42 -07:00
  • 5f1a1dd185
    Merge branch 'main' into results/main results/main RiddleHe 2026-03-23 19:19:38 -07:00
  • f1d5f63aaa Implemented safe rerun; fixed skill division bug Muyu He 2026-03-23 19:14:47 -07:00
  • 93b4ff92b7
    Merge pull request #17 from alckasoc/main RiddleHe 2026-03-20 19:10:38 -07:00
  • 97a7fd69e9
    Merge branch 'collinear-ai:main' into main Vincent Tu 2026-03-20 19:09:17 -07:00
  • f95861aeb9 scope creep bot runner alckasoc 2026-03-20 19:08:44 -07:00
  • 2f38babba6 Fixed bot with RAT feature Muyu He 2026-03-20 19:06:04 -07:00
  • 6a34a1d572 Updated prompt / commands Muyu He 2026-03-20 17:34:32 -07:00
  • 35467c050a
    Merge pull request #16 from alckasoc/main RiddleHe 2026-03-20 18:53:47 -07:00
  • b043b690c3 fix seeding alckasoc 2026-03-20 18:43:19 -07:00
  • e011030e57
    Merge pull request #15 from alckasoc/main RiddleHe 2026-03-20 17:33:18 -07:00
  • f76f5be652 calibrating + bug fix tool_choice="auto" for 5.4 mini/nano alckasoc 2026-03-20 16:27:30 -07:00
  • d829b07e60 update prompt alckasoc 2026-03-20 06:01:04 -07:00
  • 3827464380 logging and plotting alckasoc 2026-03-20 05:19:56 -07:00
  • e71aac14c2
    Merge pull request #14 from alckasoc/results/main Anand Kumar 2026-03-19 20:28:38 -07:00
  • 64941fdc20
    Merge pull request #12 from collinear-ai/results/main Anand Kumar 2026-03-19 20:18:13 -07:00
  • 04d945f5d9 remove scratchpad read alckasoc 2026-03-19 19:24:09 -07:00
  • ef7c64b5cb Updated design mds Muyu He 2026-03-19 18:39:57 -07:00
  • e049140beb Updated client loyalty feature Muyu He 2026-03-19 17:52:49 -07:00
  • aa2544a53e fixed the world seed; remove payment dispute alckasoc 2026-03-19 17:09:31 -07:00
  • b6f664557c Removed browse limit from bot runner Muyu He 2026-03-19 13:44:31 -07:00
  • 250be1406e push alckasoc 2026-03-19 13:23:36 -07:00
  • df82858c88 init alckasoc 2026-03-18 12:17:35 -07:00
  • 4b8641a4c6 Changed default config for reward Muyu He 2026-03-16 18:32:59 -07:00
  • 140bb58653 Capped skill rate at 10 + removed reward mult from clients Muyu He 2026-03-16 16:09:17 -07:00
  • d976b9cbb4
    Merge pull request #11 from collinear-ai/feat/multi-episode Adit Jain 2026-03-13 18:21:37 -07:00
  • bc633496fa
    Merge pull request #10 from alckasoc/vincent/client_trust Adit Jain 2026-03-12 17:07:03 -07:00
  • ebfce99643 fix sim resume alckasoc 2026-03-12 12:21:42 -07:00
  • 70ae316f27 improved system design, more intuitive hparams, updated configs, greedy bot updates alckasoc 2026-03-12 12:12:47 -07:00
  • 01535c2042 Add multi-episode setting with scratchpad carryover between bankruptcies adit jain 2026-03-11 19:22:32 -07:00
  • 3d20bee609 client trust and system design docs alckasoc 2026-03-10 14:24:13 -07:00
  • d28ccb1bb2 Merge upstream/main: greedy baseline fix + additive skill boost alckasoc 2026-03-09 17:38:53 -07:00
  • 11f4b89144 Add multi-strategy client trust system with tiers, specialties, and idle-turn fix alckasoc 2026-03-09 17:37:49 -07:00
  • a38b9f4135
    Merge pull request #9 from collinear-ai/feat/fixed_greedy RiddleHe 2026-03-09 17:27:52 -07:00
  • 7daccf003a update toml and uv lock alckasoc 2026-03-09 16:40:51 -07:00
  • ec104d57aa Fixed greedy baseline and lowered min val of employee skills Muyu He 2026-03-09 15:18:49 -07:00
  • 27ca13afbc Merge remote-tracking branch 'upstream/main' into vincent/client_trust alckasoc 2026-03-09 14:54:38 -07:00
  • 98aab68b57
    Merge pull request #8 from collinear-ai/system-design-docs RiddleHe 2026-03-09 13:02:25 -07:00
  • 86eabf6697 init alckasoc 2026-03-08 17:40:10 -07:00
  • ecd3d9e415 Add system design documentation for yc-bench AnandK27 2026-03-08 13:42:41 -07:00
  • b1cd7ebfb2
    Merge pull request #7 from collinear-ai/feat/employee_tiers Adit Jain 2026-03-07 22:04:45 -08:00
  • 7f24589793 Light update of readme Muyu He 2026-03-06 18:56:46 -08:00
  • a456d9c6ae Updated initial eval on new backend Muyu He 2026-03-06 18:49:32 -08:00
  • 8c949db160 Fixed task difficulty with base reward & deadline change Muyu He 2026-03-06 18:08:11 -08:00
  • 542d3b9836
    Merge pull request #6 from collinear-ai/feat/employee_tiers Adit Jain 2026-03-06 14:45:33 -08:00
  • 99e69190ec Calibrated domain prestge bump Muyu He 2026-03-06 14:40:45 -08:00
  • 5671e0102f Calibrated task difficulty based on deadlines Muyu He 2026-03-06 11:18:22 -08:00
  • eb18c5a90c Updated backend to calculate employee tier with spiky skill distribution; simplified domain count to 4 Muyu He 2026-03-05 18:12:48 -08:00
  • 6d6f0a855d Rename Greedy Bot to Human Devised Rule in README adit jain 2026-02-27 16:21:32 -08:00
  • 763ed3d750 Rename Greedy Bot to Human Devised Rule, remove other bot baselines from plots adit jain 2026-02-27 14:02:44 -08:00
  • 89065f3487
    Delete Sonnet results section from README Adit Jain 2026-02-28 02:39:53 +05:30
  • 91455bbca2
    Merge pull request #5 from collinear-ai/fresh-main Adit Jain 2026-02-27 12:45:46 -08:00
  • e9aa362772 Add prestige radar chart comparing domain specialization across models adit jain 2026-02-27 12:43:28 -08:00
  • 81664f69bb
    Merge pull request #4 from collinear-ai/fresh-main Adit Jain 2026-02-26 22:20:38 -08:00
  • 5eebd80b2f
    Merge branch 'main' into fresh-main Adit Jain 2026-02-26 22:20:24 -08:00
  • 95c6583053 Add live dashboard section to README adit jain 2026-02-26 22:16:20 -08:00
  • f25a2be1e4 Add live terminal dashboard with Rich adit jain 2026-02-26 22:11:55 -08:00
  • d4ce0a1e5a Fix start.sh: re-download and re-exec when piped via curl adit jain 2026-02-26 21:24:56 -08:00
  • 040e678a76 Fix start.sh: reattach stdin to /dev/tty for curl pipe usage adit jain 2026-02-26 21:22:48 -08:00
  • a406d2d9f9 readme fixes AnandK27 2026-02-26 00:39:39 -08:00
  • 3281eff755 Update README with citation for YC-Bench Adit Jain 2026-02-26 00:26:03 +05:30
  • 2b528358a7 Fix formatting in README for discrete-event simulation Adit Jain 2026-02-25 23:39:29 +05:30
  • 14b0179f41 Fix formatting issues in README.md Adit Jain 2026-02-25 23:04:38 +05:30
  • 86b0741c41 Fix start.sh: re-download and re-exec when piped via curl adit jain 2026-02-26 21:24:56 -08:00
  • 56ad582226 Fix start.sh: reattach stdin to /dev/tty for curl pipe usage adit jain 2026-02-26 21:22:48 -08:00
  • 649c42207a Add db/ source files that were blocked by overly broad gitignore adit jain 2026-02-26 21:19:45 -08:00
  • db7d9f218a Add db/ source files that were blocked by overly broad gitignore adit jain 2026-02-26 21:19:45 -08:00
  • 6df0f79055
    Merge pull request #3 from collinear-ai/fresh-main Adit Jain 2026-02-26 21:17:54 -08:00
  • a11b2828a9 Fix fresh install: add missing __init__.py and fix .gitignore adit jain 2026-02-26 21:15:17 -08:00
  • 5ccd14c02f
    Merge pull request #2 from collinear-ai/fresh-main Adit Jain 2026-02-26 21:13:36 -08:00
  • 5f31969865 Add Collinear branding, bot runners, and clean up stale plots adit jain 2026-02-26 21:12:05 -08:00
  • 75a53de25c Add interactive quickstart: yc-bench start and one-line start.sh adit jain 2026-02-26 21:10:56 -08:00
  • 3643806dce Added the configs and updated the results. adit jain 2026-02-26 13:37:58 -08:00
  • 5c39e448de readme fixes AnandK27 2026-02-26 00:39:39 -08:00
  • 07f159830d
    Merge pull request #1 from collinear-ai/fresh-main Adit Jain 2026-02-26 00:57:39 -08:00
  • 5d2962073d Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results adit jain 2026-02-26 00:31:00 -08:00
  • d1d7bc97b5 Add 5-level difficulty gradient: tutorial → easy → medium → hard → nightmare adit jain 2026-02-25 19:33:55 -08:00
  • 08f081d322
    Update README with citation for YC-Bench Adit Jain 2026-02-26 00:26:03 +05:30
  • 78c86b35e0
    Fix formatting in README for discrete-event simulation Adit Jain 2026-02-25 23:39:29 +05:30
  • 7ad96dee8f
    Fix formatting issues in README.md Adit Jain 2026-02-25 23:04:38 +05:30
  • 3f51641bf5 Add CLAUDE.md to gitignore adit jain 2026-02-25 02:28:48 -08:00
  • 3a1c562827 Initial commit adit jain 2026-02-25 02:16:35 -08:00