10 KiB
Data Models & Database Design
Location: src/yc_bench/db/
Design Choice: SQLAlchemy ORM with SQLite
The benchmark uses SQLAlchemy's declarative ORM over SQLite for several reasons:
- Single-file persistence: SQLite stores the entire game state in one file, making runs portable and inspectable
- Transactional safety: ACID guarantees prevent partial state updates
- Query flexibility: SQL allows complex queries for financial reports, task filtering, etc.
- Dual-backend support: The same ORM works with PostgreSQL via
DATABASE_URLenvironment variable for production/scaling scenarios
Schema Overview
┌──────────────┐ ┌───────────────────┐
│ Company │────<│ CompanyPrestige │ (1 per domain × company)
└──────┬───────┘ └───────────────────┘
│
├────<┌──────────────┐ ┌──────────────────┐
│ │ Employee │────<│ EmployeeSkillRate │ (1 per domain × employee)
│ └──────┬───────┘ └──────────────────┘
│ │
│ │ ┌────────────────┐
│ └───<│ TaskAssignment │ (employee ↔ task junction)
│ └────────┬───────┘
│ │
├────<┌──────────┐────────┘
│ │ Task │────<┌─────────────────┐
│ └────┬─────┘ │ TaskRequirement │ (1 per domain × task)
│ │ └─────────────────┘
│ │
│ └────>┌──────────┐
│ │ Client │ (task issuer with hidden multiplier)
│ └────┬─────┘
│ │
├────<┌───────────────┘
│ │ ClientTrust │ (company ↔ client trust level)
│ └──────────────┘
│
├────<┌──────────────┐
│ │ SimEvent │ (discrete events queue)
│ └──────────────┘
│
├────<┌──────────────┐
│ │ LedgerEntry │ (financial transactions)
│ └──────────────┘
│
├────<┌──────────────┐
│ │ SimState │ (simulation clock & counters)
│ └──────────────┘
│
└────<┌──────────────┐
│ Scratchpad │ (agent persistent memory)
└──────────────┘
Model Details
Company (models/company.py)
| Column | Type | Notes |
|---|---|---|
id |
UUID (PK) | Auto-generated |
name |
String | Company name |
funds_cents |
BigInteger | Financial balance in cents |
Design choice: Funds stored in cents (integer) to avoid floating-point rounding errors in financial calculations. BigInteger supports very large/negative values.
CompanyPrestige (models/company.py)
| Column | Type | Notes |
|---|---|---|
company_id |
UUID (FK) | References Company |
domain |
String | research / inference / data_environment / training |
prestige_level |
Float | Range [1.0, 10.0] |
Design choice: Prestige is tracked per-domain rather than as a single score. This forces specialization trade-offs and creates a 4-dimensional progression space.
Employee (models/employee.py)
| Column | Type | Notes |
|---|---|---|
id |
UUID (PK) | Auto-generated |
company_id |
UUID (FK) | References Company |
name |
String | Employee name |
tier |
String | junior / mid / senior |
work_hours_per_day |
Float | Hours available per business day |
salary_cents |
BigInteger | Monthly salary in cents |
EmployeeSkillRate (models/employee.py)
| Column | Type | Notes |
|---|---|---|
employee_id |
UUID (FK) | References Employee |
domain |
String | One of 4 domains |
rate_domain_per_hour |
Float | Work units produced per hour |
Design choice: Skill rates are hidden from the agent. The agent sees tier and salary but not per-domain effectiveness. This creates an information asymmetry puzzle -- the agent must infer employee strengths from task outcomes.
Task (models/task.py)
| Column | Type | Notes |
|---|---|---|
id |
UUID (PK) | Auto-generated |
company_id |
UUID (FK, nullable) | NULL = market task, set on acceptance |
status |
Enum | market → planned → active → completed_success / completed_fail / cancelled |
title |
String | Task description |
required_prestige |
Float | Minimum prestige needed in ALL task domains |
reward_funds_cents |
BigInteger | Payment on successful completion |
reward_prestige_delta |
Float | Prestige gained per domain on success |
skill_boost_pct |
Float | Employee skill rate increase on success |
accepted_at |
DateTime (nullable) | When task was accepted from market |
deadline |
DateTime (nullable) | Calculated at acceptance |
completed_at |
DateTime (nullable) | When task finished |
success |
Boolean (nullable) | True = on-time, False = late |
progress_milestone_pct |
Float | Tracks progress milestones (e.g., 50%) |
Design choice: company_id being nullable elegantly distinguishes market tasks (available for browsing) from accepted tasks (owned by the company).
TaskRequirement (models/task.py)
| Column | Type | Notes |
|---|---|---|
task_id |
UUID (FK) | References Task |
domain |
String | Which domain this requirement covers |
required_qty |
Float | Total work units needed |
completed_qty |
Float | Work units completed so far |
Design choice: Multi-domain requirements make tasks a multi-dimensional optimization problem. A task might need work in 2-4 domains simultaneously.
TaskAssignment (models/task.py)
| Column | Type | Notes |
|---|---|---|
task_id |
UUID (FK) | References Task |
employee_id |
UUID (FK) | References Employee |
assigned_at |
DateTime | When assigned |
Design choice: Many-to-many junction table. An employee can work on multiple tasks (throughput splits), and a task can have multiple employees (parallel progress).
Client (models/client.py)
| Column | Type | Notes |
|---|---|---|
id |
UUID (PK) | Auto-generated |
name |
String(255) | Client company name (e.g. "Nexus AI") |
reward_multiplier |
Float | Per-client reward factor [0.7, 2.5] (currently unused in reward calculation) |
tier |
String(32) | Agent-visible label: Standard / Premium / Enterprise |
specialty_domains |
JSON | List of 1-2 domain strings (e.g. ["research", "training"]) |
loyalty |
Float | Hidden loyalty score [-1.0, 1.0]. RAT clients (< -0.3) cause scope creep |
Design choice: loyalty is hidden from the agent. RAT clients secretly inflate task work after acceptance (scope creep), causing deadline failures. The agent must detect RATs by observing per-client failure patterns via client history.
ClientTrust (models/client.py)
| Column | Type | Notes |
|---|---|---|
company_id |
UUID (FK, PK) | References Company |
client_id |
UUID (FK, PK) | References Client |
trust_level |
Numeric(6,3) | Range [0.0, 5.0], default 0.000 |
Design choice: Composite primary key (company_id, client_id) — one trust level per company-client pair. Trust affects both reward scaling and work reduction. See 11_client_trust.md for full mechanics.
SimEvent (models/event.py)
| Column | Type | Notes |
|---|---|---|
id |
UUID (PK) | Deterministic (uuid5) |
company_id |
UUID (FK) | References Company |
event_type |
String | task_completed / bankruptcy / task_half / horizon_end |
scheduled_at |
DateTime | When event triggers |
payload |
JSON | Event-specific data |
dedupe_key |
String | Prevents duplicate events |
consumed |
Boolean | True after processing |
LedgerEntry (models/ledger.py)
| Column | Type | Notes |
|---|---|---|
id |
UUID (PK) | Auto-generated |
company_id |
UUID (FK) | References Company |
occurred_at |
DateTime | Transaction timestamp |
category |
Enum | MONTHLY_PAYROLL / TASK_REWARD / TASK_FAIL_PENALTY / TASK_CANCEL_PENALTY |
amount_cents |
BigInteger | Signed amount (negative = cost) |
ref_type |
String (nullable) | Reference entity type |
ref_id |
UUID (nullable) | Reference entity ID |
Design choice: Immutable append-only ledger provides a complete financial audit trail. No entries are ever deleted or modified.
SimState (models/sim_state.py)
| Column | Type | Notes |
|---|---|---|
company_id |
UUID (FK, PK) | References Company |
sim_time |
DateTime | Current simulation clock |
run_seed |
Integer | RNG seed for reproducibility |
horizon_end |
DateTime | When simulation ends |
replenish_counter |
Integer | Tracks market task replenishment |
Scratchpad (models/scratchpad.py)
| Column | Type | Notes |
|---|---|---|
company_id |
UUID (FK) | References Company |
content |
Text | Free-form agent notes |
Design choice: Scratchpad survives LLM context truncation, giving the agent persistent memory across the full simulation.
Session Management (session.py)
session_scope(factory) → context manager
- Creates a scoped session with automatic commit/rollback
- Supports both SQLite (default) and PostgreSQL (via
DATABASE_URL) init_db()creates all tables from ORM metadata
Design choice: Context manager pattern ensures every database operation is properly transacted, preventing partial state updates that would corrupt the simulation.