collinear-ai/yc-bench

Fork 0

mirror of https://github.com/collinear-ai/yc-bench.git synced 2026-04-19 12:58:03 +00:00

Muyu He ef7c64b5cb Updated design mds

2026-03-19 18:39:57 -07:00

10 KiB

Raw Blame History

Data Models & Database Design

Location: src/yc_bench/db/

Design Choice: SQLAlchemy ORM with SQLite

The benchmark uses SQLAlchemy's declarative ORM over SQLite for several reasons:

Single-file persistence: SQLite stores the entire game state in one file, making runs portable and inspectable
Transactional safety: ACID guarantees prevent partial state updates
Query flexibility: SQL allows complex queries for financial reports, task filtering, etc.
Dual-backend support: The same ORM works with PostgreSQL via DATABASE_URL environment variable for production/scaling scenarios

Schema Overview

┌──────────────┐     ┌───────────────────┐
│   Company    │────<│  CompanyPrestige   │  (1 per domain × company)
└──────┬───────┘     └───────────────────┘
       │
       ├────<┌──────────────┐     ┌──────────────────┐
       │     │   Employee   │────<│ EmployeeSkillRate │  (1 per domain × employee)
       │     └──────┬───────┘     └──────────────────┘
       │            │
       │            │    ┌────────────────┐
       │            └───<│ TaskAssignment  │  (employee ↔ task junction)
       │                 └────────┬───────┘
       │                         │
       ├────<┌──────────┐────────┘
       │     │   Task   │────<┌─────────────────┐
       │     └────┬─────┘     │ TaskRequirement  │  (1 per domain × task)
       │          │           └─────────────────┘
       │          │
       │          └────>┌──────────┐
       │                │  Client  │  (task issuer with hidden multiplier)
       │                └────┬─────┘
       │                     │
       ├────<┌───────────────┘
       │     │  ClientTrust  │  (company ↔ client trust level)
       │     └──────────────┘
       │
       ├────<┌──────────────┐
       │     │  SimEvent    │  (discrete events queue)
       │     └──────────────┘
       │
       ├────<┌──────────────┐
       │     │ LedgerEntry  │  (financial transactions)
       │     └──────────────┘
       │
       ├────<┌──────────────┐
       │     │  SimState    │  (simulation clock & counters)
       │     └──────────────┘
       │
       └────<┌──────────────┐
             │  Scratchpad  │  (agent persistent memory)
             └──────────────┘

Model Details

Company (`models/company.py`)

Column	Type	Notes
`id`	UUID (PK)	Auto-generated
`name`	String	Company name
`funds_cents`	BigInteger	Financial balance in cents

Design choice: Funds stored in cents (integer) to avoid floating-point rounding errors in financial calculations. BigInteger supports very large/negative values.

CompanyPrestige (`models/company.py`)

Column	Type	Notes
`company_id`	UUID (FK)	References Company
`domain`	String	research / inference / data_environment / training
`prestige_level`	Float	Range [1.0, 10.0]

Design choice: Prestige is tracked per-domain rather than as a single score. This forces specialization trade-offs and creates a 4-dimensional progression space.

Employee (`models/employee.py`)

Column	Type	Notes
`id`	UUID (PK)	Auto-generated
`company_id`	UUID (FK)	References Company
`name`	String	Employee name
`tier`	String	junior / mid / senior
`work_hours_per_day`	Float	Hours available per business day
`salary_cents`	BigInteger	Monthly salary in cents

EmployeeSkillRate (`models/employee.py`)

Column	Type	Notes
`employee_id`	UUID (FK)	References Employee
`domain`	String	One of 4 domains
`rate_domain_per_hour`	Float	Work units produced per hour

Design choice: Skill rates are hidden from the agent. The agent sees tier and salary but not per-domain effectiveness. This creates an information asymmetry puzzle -- the agent must infer employee strengths from task outcomes.

Task (`models/task.py`)

Column	Type	Notes
`id`	UUID (PK)	Auto-generated
`company_id`	UUID (FK, nullable)	NULL = market task, set on acceptance
`status`	Enum	market → planned → active → completed_success / completed_fail / cancelled
`title`	String	Task description
`required_prestige`	Float	Minimum prestige needed in ALL task domains
`reward_funds_cents`	BigInteger	Payment on successful completion
`reward_prestige_delta`	Float	Prestige gained per domain on success
`skill_boost_pct`	Float	Employee skill rate increase on success
`accepted_at`	DateTime (nullable)	When task was accepted from market
`deadline`	DateTime (nullable)	Calculated at acceptance
`completed_at`	DateTime (nullable)	When task finished
`success`	Boolean (nullable)	True = on-time, False = late
`progress_milestone_pct`	Float	Tracks progress milestones (e.g., 50%)

Design choice: company_id being nullable elegantly distinguishes market tasks (available for browsing) from accepted tasks (owned by the company).

TaskRequirement (`models/task.py`)

Column	Type	Notes
`task_id`	UUID (FK)	References Task
`domain`	String	Which domain this requirement covers
`required_qty`	Float	Total work units needed
`completed_qty`	Float	Work units completed so far

Design choice: Multi-domain requirements make tasks a multi-dimensional optimization problem. A task might need work in 2-4 domains simultaneously.

TaskAssignment (`models/task.py`)

Column	Type	Notes
`task_id`	UUID (FK)	References Task
`employee_id`	UUID (FK)	References Employee
`assigned_at`	DateTime	When assigned

Design choice: Many-to-many junction table. An employee can work on multiple tasks (throughput splits), and a task can have multiple employees (parallel progress).

Client (`models/client.py`)

Column	Type	Notes
`id`	UUID (PK)	Auto-generated
`name`	String(255)	Client company name (e.g. "Nexus AI")
`reward_multiplier`	Float	Per-client reward factor [0.7, 2.5] (currently unused in reward calculation)
`tier`	String(32)	Agent-visible label: Standard / Premium / Enterprise
`specialty_domains`	JSON	List of 1-2 domain strings (e.g. ["research", "training"])
`loyalty`	Float	Hidden loyalty score [-1.0, 1.0]. RAT clients (< -0.3) cause scope creep

Design choice: loyalty is hidden from the agent. RAT clients secretly inflate task work after acceptance (scope creep), causing deadline failures. The agent must detect RATs by observing per-client failure patterns via client history.

ClientTrust (`models/client.py`)

Column	Type	Notes
`company_id`	UUID (FK, PK)	References Company
`client_id`	UUID (FK, PK)	References Client
`trust_level`	Numeric(6,3)	Range [0.0, 5.0], default 0.000

Design choice: Composite primary key (company_id, client_id) — one trust level per company-client pair. Trust affects both reward scaling and work reduction. See 11_client_trust.md for full mechanics.

SimEvent (`models/event.py`)

Column	Type	Notes
`id`	UUID (PK)	Deterministic (uuid5)
`company_id`	UUID (FK)	References Company
`event_type`	String	task_completed / bankruptcy / task_half / horizon_end
`scheduled_at`	DateTime	When event triggers
`payload`	JSON	Event-specific data
`dedupe_key`	String	Prevents duplicate events
`consumed`	Boolean	True after processing

LedgerEntry (`models/ledger.py`)

Column	Type	Notes
`id`	UUID (PK)	Auto-generated
`company_id`	UUID (FK)	References Company
`occurred_at`	DateTime	Transaction timestamp
`category`	Enum	MONTHLY_PAYROLL / TASK_REWARD / TASK_FAIL_PENALTY / TASK_CANCEL_PENALTY
`amount_cents`	BigInteger	Signed amount (negative = cost)
`ref_type`	String (nullable)	Reference entity type
`ref_id`	UUID (nullable)	Reference entity ID

Design choice: Immutable append-only ledger provides a complete financial audit trail. No entries are ever deleted or modified.

SimState (`models/sim_state.py`)

Column	Type	Notes
`company_id`	UUID (FK, PK)	References Company
`sim_time`	DateTime	Current simulation clock
`run_seed`	Integer	RNG seed for reproducibility
`horizon_end`	DateTime	When simulation ends
`replenish_counter`	Integer	Tracks market task replenishment

Scratchpad (`models/scratchpad.py`)

Column	Type	Notes
`company_id`	UUID (FK)	References Company
`content`	Text	Free-form agent notes

Design choice: Scratchpad survives LLM context truncation, giving the agent persistent memory across the full simulation.

Session Management (`session.py`)

session_scope(factory) → context manager

Creates a scoped session with automatic commit/rollback
Supports both SQLite (default) and PostgreSQL (via DATABASE_URL)
init_db() creates all tables from ORM metadata

Design choice: Context manager pattern ensures every database operation is properly transacted, preventing partial state updates that would corrupt the simulation.

10 KiB Raw Blame History Unescape Escape

Data Models & Database Design

Design Choice: SQLAlchemy ORM with SQLite

Schema Overview

Model Details

Company (models/company.py)

CompanyPrestige (models/company.py)

Employee (models/employee.py)

EmployeeSkillRate (models/employee.py)

Task (models/task.py)

TaskRequirement (models/task.py)

TaskAssignment (models/task.py)

Client (models/client.py)

ClientTrust (models/client.py)

SimEvent (models/event.py)

LedgerEntry (models/ledger.py)

SimState (models/sim_state.py)

Scratchpad (models/scratchpad.py)

Session Management (session.py)