mirror of https://github.com/collinear-ai/yc-bench.git synced 2026-04-19 12:58:03 +00:00

AnandK27 ecd3d9e415 Add system design documentation for yc-bench

Comprehensive documentation covering all major subsystems:
simulation engine, data models, task system, prestige, finances,
employees, agent layer, CLI interface, configuration, and runner.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-08 13:42:41 -07:00

8.6 KiB

Raw Blame History

Data Models & Database Design

Location: src/yc_bench/db/

Design Choice: SQLAlchemy ORM with SQLite

The benchmark uses SQLAlchemy's declarative ORM over SQLite for several reasons:

Single-file persistence: SQLite stores the entire game state in one file, making runs portable and inspectable
Transactional safety: ACID guarantees prevent partial state updates
Query flexibility: SQL allows complex queries for financial reports, task filtering, etc.
Dual-backend support: The same ORM works with PostgreSQL via DATABASE_URL environment variable for production/scaling scenarios

Schema Overview

┌──────────────┐     ┌───────────────────┐
│   Company    │────<│  CompanyPrestige   │  (1 per domain × company)
└──────┬───────┘     └───────────────────┘
       │
       ├────<┌──────────────┐     ┌──────────────────┐
       │     │   Employee   │────<│ EmployeeSkillRate │  (1 per domain × employee)
       │     └──────┬───────┘     └──────────────────┘
       │            │
       │            │    ┌────────────────┐
       │            └───<│ TaskAssignment  │  (employee ↔ task junction)
       │                 └────────┬───────┘
       │                         │
       ├────<┌──────────┐────────┘
       │     │   Task   │────<┌─────────────────┐
       │     └──────────┘     │ TaskRequirement  │  (1 per domain × task)
       │                      └─────────────────┘
       │
       ├────<┌──────────────┐
       │     │  SimEvent    │  (discrete events queue)
       │     └──────────────┘
       │
       ├────<┌──────────────┐
       │     │ LedgerEntry  │  (financial transactions)
       │     └──────────────┘
       │
       ├────<┌──────────────┐
       │     │  SimState    │  (simulation clock & counters)
       │     └──────────────┘
       │
       └────<┌──────────────┐
             │  Scratchpad  │  (agent persistent memory)
             └──────────────┘

Model Details

Company (`models/company.py`)

Column	Type	Notes
`id`	UUID (PK)	Auto-generated
`name`	String	Company name
`funds_cents`	BigInteger	Financial balance in cents

Design choice: Funds stored in cents (integer) to avoid floating-point rounding errors in financial calculations. BigInteger supports very large/negative values.

CompanyPrestige (`models/company.py`)

Column	Type	Notes
`company_id`	UUID (FK)	References Company
`domain`	String	research / inference / data_environment / training
`prestige_level`	Float	Range [1.0, 10.0]

Design choice: Prestige is tracked per-domain rather than as a single score. This forces specialization trade-offs and creates a 4-dimensional progression space.

Employee (`models/employee.py`)

Column	Type	Notes
`id`	UUID (PK)	Auto-generated
`company_id`	UUID (FK)	References Company
`name`	String	Employee name
`tier`	String	junior / mid / senior
`work_hours_per_day`	Float	Hours available per business day
`salary_cents`	BigInteger	Monthly salary in cents

EmployeeSkillRate (`models/employee.py`)

Column	Type	Notes
`employee_id`	UUID (FK)	References Employee
`domain`	String	One of 4 domains
`rate_domain_per_hour`	Float	Work units produced per hour

Design choice: Skill rates are hidden from the agent. The agent sees tier and salary but not per-domain effectiveness. This creates an information asymmetry puzzle -- the agent must infer employee strengths from task outcomes.

Task (`models/task.py`)

Column	Type	Notes
`id`	UUID (PK)	Auto-generated
`company_id`	UUID (FK, nullable)	NULL = market task, set on acceptance
`status`	Enum	market → planned → active → completed_success / completed_fail / cancelled
`title`	String	Task description
`required_prestige`	Float	Minimum prestige needed in ALL task domains
`reward_funds_cents`	BigInteger	Payment on successful completion
`reward_prestige_delta`	Float	Prestige gained per domain on success
`skill_boost_pct`	Float	Employee skill rate increase on success
`accepted_at`	DateTime (nullable)	When task was accepted from market
`deadline`	DateTime (nullable)	Calculated at acceptance
`completed_at`	DateTime (nullable)	When task finished
`success`	Boolean (nullable)	True = on-time, False = late
`progress_milestone_pct`	Float	Tracks progress milestones (e.g., 50%)

Design choice: company_id being nullable elegantly distinguishes market tasks (available for browsing) from accepted tasks (owned by the company).

TaskRequirement (`models/task.py`)

Column	Type	Notes
`task_id`	UUID (FK)	References Task
`domain`	String	Which domain this requirement covers
`required_qty`	Float	Total work units needed
`completed_qty`	Float	Work units completed so far

Design choice: Multi-domain requirements make tasks a multi-dimensional optimization problem. A task might need work in 2-4 domains simultaneously.

TaskAssignment (`models/task.py`)

Column	Type	Notes
`task_id`	UUID (FK)	References Task
`employee_id`	UUID (FK)	References Employee
`assigned_at`	DateTime	When assigned

Design choice: Many-to-many junction table. An employee can work on multiple tasks (throughput splits), and a task can have multiple employees (parallel progress).

SimEvent (`models/event.py`)

Column	Type	Notes
`id`	UUID (PK)	Deterministic (uuid5)
`company_id`	UUID (FK)	References Company
`event_type`	String	task_completed / bankruptcy / task_half / horizon_end
`scheduled_at`	DateTime	When event triggers
`payload`	JSON	Event-specific data
`dedupe_key`	String	Prevents duplicate events
`consumed`	Boolean	True after processing

LedgerEntry (`models/ledger.py`)

Column	Type	Notes
`id`	UUID (PK)	Auto-generated
`company_id`	UUID (FK)	References Company
`occurred_at`	DateTime	Transaction timestamp
`category`	Enum	MONTHLY_PAYROLL / TASK_REWARD / TASK_FAIL_PENALTY / TASK_CANCEL_PENALTY
`amount_cents`	BigInteger	Signed amount (negative = cost)
`ref_type`	String (nullable)	Reference entity type
`ref_id`	UUID (nullable)	Reference entity ID

Design choice: Immutable append-only ledger provides a complete financial audit trail. No entries are ever deleted or modified.

SimState (`models/sim_state.py`)

Column	Type	Notes
`company_id`	UUID (FK, PK)	References Company
`sim_time`	DateTime	Current simulation clock
`run_seed`	Integer	RNG seed for reproducibility
`horizon_end`	DateTime	When simulation ends
`replenish_counter`	Integer	Tracks market task replenishment

Scratchpad (`models/scratchpad.py`)

Column	Type	Notes
`company_id`	UUID (FK)	References Company
`content`	Text	Free-form agent notes

Design choice: Scratchpad survives LLM context truncation, giving the agent persistent memory across the full simulation.

Session Management (`session.py`)

session_scope(factory) → context manager

Creates a scoped session with automatic commit/rollback
Supports both SQLite (default) and PostgreSQL (via DATABASE_URL)
init_db() creates all tables from ORM metadata

Design choice: Context manager pattern ensures every database operation is properly transacted, preventing partial state updates that would corrupt the simulation.

8.6 KiB Raw Blame History Unescape Escape

Data Models & Database Design

Design Choice: SQLAlchemy ORM with SQLite

Schema Overview

Model Details

Company (models/company.py)

CompanyPrestige (models/company.py)

Employee (models/employee.py)

EmployeeSkillRate (models/employee.py)

Task (models/task.py)

TaskRequirement (models/task.py)

TaskAssignment (models/task.py)

SimEvent (models/event.py)

LedgerEntry (models/ledger.py)

SimState (models/sim_state.py)

Scratchpad (models/scratchpad.py)

Session Management (session.py)