Commit graph

93 commits

Author SHA1 Message Date
Vincent Tu
bfb0c88062
Merge pull request #26 from collinear-ai/vincent/gemma4-31b-results
gemma 4 31b results; went bankrupt!
2026-04-04 20:21:23 -07:00
alckasoc
a4a8208022 gemma 4 31b results; went bankrupt! 2026-04-04 20:20:11 -07:00
Vincent Tu
d253c58782
Merge pull request #25 from collinear-ai/vincent/website
minor website update!
2026-04-04 17:59:47 -07:00
alckasoc
bce35279cb minor website update! 2026-04-04 17:58:15 -07:00
Vincent Tu
e1cd26e36e
Merge pull request #24 from collinear-ai/vincent/website
Update Website
2026-04-04 17:50:45 -07:00
alckasoc
f54585df5e update website 2026-04-04 17:33:48 -07:00
Nazneen Rajani
ffd77905ae
Merge pull request #23 from collinear-ai/nazneenrajani-patch-1
Revise citation for YC-Bench in README
2026-04-04 16:55:25 -07:00
Nazneen Rajani
a9e3df8827
Revise citation for YC-Bench in README
Updated citation details in the README file.
2026-04-04 16:55:13 -07:00
Anand Kumar
a5cee60c77
Merge pull request #21 from collinear-ai/vincent/readme
update readme; clean up unused files; black formatting
2026-04-03 20:11:40 -07:00
alckasoc
faacc5886c update webpage arxiv link 2026-04-03 15:39:16 -07:00
alckasoc
38eaea7d0c update readme; clean up unused files; black formatting 2026-04-01 14:44:39 -07:00
Vincent Tu
97b1bdb2e0
Merge pull request #20 from collinear-ai/vincent/webpage
GitHub Webpage
2026-04-01 13:56:52 -07:00
alckasoc
556a35363d update index html 2026-04-01 13:56:42 -07:00
alckasoc
6eba7a9854 Add static docs site for GitHub Pages
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 13:32:58 -07:00
RiddleHe
0c53c98f01
Merge pull request #19 from collinear-ai/results/main
Results/main
2026-03-23 21:22:42 -07:00
RiddleHe
5f1a1dd185
Merge branch 'main' into results/main 2026-03-23 19:19:38 -07:00
Muyu He
f1d5f63aaa Implemented safe rerun; fixed skill division bug 2026-03-23 19:14:47 -07:00
RiddleHe
93b4ff92b7
Merge pull request #17 from alckasoc/main
bot runner scope creep
2026-03-20 19:10:38 -07:00
Vincent Tu
97a7fd69e9
Merge branch 'collinear-ai:main' into main 2026-03-20 19:09:17 -07:00
alckasoc
f95861aeb9 scope creep bot runner 2026-03-20 19:08:44 -07:00
Muyu He
2f38babba6 Fixed bot with RAT feature 2026-03-20 19:06:04 -07:00
Muyu He
6a34a1d572 Updated prompt / commands 2026-03-20 18:57:18 -07:00
RiddleHe
35467c050a
Merge pull request #16 from alckasoc/main
fix seeding
2026-03-20 18:53:47 -07:00
alckasoc
b043b690c3 fix seeding 2026-03-20 18:43:19 -07:00
RiddleHe
e011030e57
Merge pull request #15 from alckasoc/main
logging and plotting code + run sh
2026-03-20 17:33:18 -07:00
alckasoc
f76f5be652 calibrating + bug fix tool_choice="auto" for 5.4 mini/nano 2026-03-20 16:27:30 -07:00
alckasoc
d829b07e60 update prompt 2026-03-20 06:01:04 -07:00
alckasoc
3827464380 logging and plotting 2026-03-20 05:19:56 -07:00
Anand Kumar
e71aac14c2
Merge pull request #14 from alckasoc/results/main
remove scratchpad read
2026-03-19 20:28:38 -07:00
Anand Kumar
64941fdc20
Merge pull request #12 from collinear-ai/results/main
Results/main
2026-03-19 20:18:13 -07:00
alckasoc
04d945f5d9 remove scratchpad read 2026-03-19 19:24:09 -07:00
Muyu He
ef7c64b5cb Updated design mds 2026-03-19 18:39:57 -07:00
Muyu He
e049140beb Updated client loyalty feature 2026-03-19 17:52:49 -07:00
Muyu He
b6f664557c Removed browse limit from bot runner 2026-03-19 13:44:31 -07:00
Muyu He
4b8641a4c6 Changed default config for reward 2026-03-16 18:32:59 -07:00
Muyu He
140bb58653 Capped skill rate at 10 + removed reward mult from clients 2026-03-16 16:09:17 -07:00
Adit Jain
d976b9cbb4
Merge pull request #11 from collinear-ai/feat/multi-episode
Add multi-episode setting with scratchpad carryover
2026-03-13 18:21:37 -07:00
Adit Jain
bc633496fa
Merge pull request #10 from alckasoc/vincent/client_trust
Client Trust
2026-03-12 17:07:03 -07:00
alckasoc
ebfce99643 fix sim resume 2026-03-12 12:21:42 -07:00
alckasoc
70ae316f27 improved system design, more intuitive hparams, updated configs, greedy bot updates 2026-03-12 12:12:47 -07:00
adit jain
01535c2042 Add multi-episode setting with scratchpad carryover between bankruptcies
When an agent goes bankrupt, the simulation can now restart for another
episode while preserving the scratchpad from the previous attempt. This
lets us measure whether LLMs can learn from failure via persistent notes.

Each episode gets its own SQLite DB (*.ep1.db, *.ep2.db, ...) so plotting
scripts and post-hoc analysis work unchanged. The rollout JSON aggregates
per-episode transcripts, turns, and costs.

Key changes:
- --max-episodes CLI flag (default 1, fully backward compatible)
- Per-episode DB files when max_episodes > 1
- Scratchpad read from old DB, written into fresh DB between episodes
- RunState tracks episode results with finish_episode/reset_for_new_episode
- Agent prompt tells it about the episode number and to read its scratchpad
- Plotting script for multi-episode fund curves + scratchpad evolution

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 19:22:32 -07:00
alckasoc
3d20bee609 client trust and system design docs 2026-03-10 14:24:13 -07:00
alckasoc
d28ccb1bb2 Merge upstream/main: greedy baseline fix + additive skill boost
Resolved conflicts — combined best of both:
- bot_runner.py: kept our trust-aware candidate building + upstream's tier-avg rates + no task cap
- task_complete.py: upstream's additive skill boost (nerfs greedy snowball) + our configurable cap (wc.skill_rate_max instead of hardcoded 10)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 17:39:58 -07:00
alckasoc
11f4b89144 Add multi-strategy client trust system with tiers, specialties, and idle-turn fix
- Hide exact reward_multiplier from agent; show tier (Standard/Premium/Enterprise) and specialty domains instead
- Add client domain specialization with 70% bias on task generation toward client specialties
- Remove qty_scale by multiplier (leaked info and doubly punished high-mult clients)
- Rewrite agent prompt to describe tiers/specialties without exact formulas
- Fix critical loop.py bug: provide full state context after sim resume (prevents idle multi-month skips)
- Add Streamlit dashboard, watch scripts, and updated plotting/extraction

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 17:37:49 -07:00
RiddleHe
a38b9f4135
Merge pull request #9 from collinear-ai/feat/fixed_greedy
Fixed greedy baseline and lowered min val of employee skills
2026-03-09 17:27:52 -07:00
alckasoc
7daccf003a update toml and uv lock 2026-03-09 16:40:51 -07:00
Muyu He
ec104d57aa Fixed greedy baseline and lowered min val of employee skills 2026-03-09 15:18:49 -07:00
alckasoc
27ca13afbc Merge remote-tracking branch 'upstream/main' into vincent/client_trust 2026-03-09 14:54:38 -07:00
RiddleHe
98aab68b57
Merge pull request #8 from collinear-ai/system-design-docs
Add system design documentation for yc-bench
2026-03-09 13:02:25 -07:00
alckasoc
86eabf6697 init 2026-03-08 17:40:10 -07:00