mirror of
https://github.com/collinear-ai/yc-bench.git
synced 2026-04-19 12:58:03 +00:00
readme fixes
This commit is contained in:
parent
3281eff755
commit
a406d2d9f9
3 changed files with 6 additions and 4 deletions
10
README.md
10
README.md
|
|
@ -1,14 +1,16 @@
|
|||
# YC-Bench
|
||||
# <img src="imgs/yc_bench.png" alt="YC-Bench logo" width="40" /> YC-Bench
|
||||
|
||||
A long-horizon deterministic benchmark for LLM agents. The agent plays CEO of an AI startup over a simulated 1–3 year run, operating exclusively through a CLI tool against a SQLite-backed discrete-event simulation.
|
||||
|
||||
The benchmark tests whether agents can manage compounding decisions: prestige specialisation, employee allocation, cash flow, and deadline risk — sustained over hundreds of turns.
|
||||
The benchmark tests whether agents can manage compounding decisions: prestige specialisation, employee allocation, cash flow, and deadline risk - sustained over hundreds of turns.
|
||||
|
||||
---
|
||||
|
||||
## Simulation Dynamics
|
||||
|
||||
```
|
||||

|
||||
|
||||
<!-- ```
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ AGENT (LLM) │
|
||||
│ │
|
||||
|
|
@ -44,7 +46,7 @@ The benchmark tests whether agents can manage compounding decisions: prestige sp
|
|||
│ │ Monthly payroll (1st biz day) Bankruptcy check (funds < 0) │
|
||||
│ │ Horizon end (1–3 years) Context truncation (last 20 rounds)│
|
||||
└──┴──────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
``` -->
|
||||
|
||||
### Core loop
|
||||
|
||||
|
|
|
|||
BIN
imgs/arch.png
Normal file
BIN
imgs/arch.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 135 KiB |
BIN
imgs/yc_bench.png
Normal file
BIN
imgs/yc_bench.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 589 KiB |
Loading…
Add table
Add a link
Reference in a new issue