Add system design documentation for yc-bench

Comprehensive documentation covering all major subsystems: simulation engine, data models, task system, prestige, finances, employees, agent layer, CLI interface, configuration, and runner. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-29 17:35:12 +00:00 · 2026-03-08 13:42:41 -07:00 · 2026-03-08 13:42:41 -07:00 · ecd3d9e415
commit ecd3d9e415
parent b1cd7ebfb2
11 changed files with 1858 additions and 0 deletions
--- a/system_design/02_data_models.md
+++ b/system_design/02_data_models.md
@ -0,0 +1,190 @@
+# Data Models & Database Design
+
+**Location**: `src/yc_bench/db/`
+
+## Design Choice: SQLAlchemy ORM with SQLite
+
+The benchmark uses SQLAlchemy's declarative ORM over SQLite for several reasons:
+
+1. **Single-file persistence**: SQLite stores the entire game state in one file, making runs portable and inspectable
+2. **Transactional safety**: ACID guarantees prevent partial state updates
+3. **Query flexibility**: SQL allows complex queries for financial reports, task filtering, etc.
+4. **Dual-backend support**: The same ORM works with PostgreSQL via `DATABASE_URL` environment variable for production/scaling scenarios
+
+## Schema Overview
+
+```
+┌──────────────┐     ┌───────────────────┐
+│   Company    │────<│  CompanyPrestige   │  (1 per domain × company)
+└──────┬───────┘     └───────────────────┘
+       │
+       ├────<┌──────────────┐     ┌──────────────────┐
+       │     │   Employee   │────<│ EmployeeSkillRate │  (1 per domain × employee)
+       │     └──────┬───────┘     └──────────────────┘
+       │            │
+       │            │    ┌────────────────┐
+       │            └───<│ TaskAssignment  │  (employee ↔ task junction)
+       │                 └────────┬───────┘
+       │                         │
+       ├────<┌──────────┐────────┘
+       │     │   Task   │────<┌─────────────────┐
+       │     └──────────┘     │ TaskRequirement  │  (1 per domain × task)
+       │                      └─────────────────┘
+       │
+       ├────<┌──────────────┐
+       │     │  SimEvent    │  (discrete events queue)
+       │     └──────────────┘
+       │
+       ├────<┌──────────────┐
+       │     │ LedgerEntry  │  (financial transactions)
+       │     └──────────────┘
+       │
+       ├────<┌──────────────┐
+       │     │  SimState    │  (simulation clock & counters)
+       │     └──────────────┘
+       │
+       └────<┌──────────────┐
+             │  Scratchpad  │  (agent persistent memory)
+             └──────────────┘
+```
+
+## Model Details
+
+### Company (`models/company.py`)
+
+| Column | Type | Notes |
+|--------|------|-------|
+| `id` | UUID (PK) | Auto-generated |
+| `name` | String | Company name |
+| `funds_cents` | BigInteger | Financial balance in cents |
+
+**Design choice**: Funds stored in cents (integer) to avoid floating-point rounding errors in financial calculations. BigInteger supports very large/negative values.
+
+### CompanyPrestige (`models/company.py`)
+
+| Column | Type | Notes |
+|--------|------|-------|
+| `company_id` | UUID (FK) | References Company |
+| `domain` | String | research / inference / data_environment / training |
+| `prestige_level` | Float | Range [1.0, 10.0] |
+
+**Design choice**: Prestige is tracked per-domain rather than as a single score. This forces specialization trade-offs and creates a 4-dimensional progression space.
+
+### Employee (`models/employee.py`)
+
+| Column | Type | Notes |
+|--------|------|-------|
+| `id` | UUID (PK) | Auto-generated |
+| `company_id` | UUID (FK) | References Company |
+| `name` | String | Employee name |
+| `tier` | String | junior / mid / senior |
+| `work_hours_per_day` | Float | Hours available per business day |
+| `salary_cents` | BigInteger | Monthly salary in cents |
+
+### EmployeeSkillRate (`models/employee.py`)
+
+| Column | Type | Notes |
+|--------|------|-------|
+| `employee_id` | UUID (FK) | References Employee |
+| `domain` | String | One of 4 domains |
+| `rate_domain_per_hour` | Float | Work units produced per hour |
+
+**Design choice**: Skill rates are **hidden from the agent**. The agent sees tier and salary but not per-domain effectiveness. This creates an information asymmetry puzzle -- the agent must infer employee strengths from task outcomes.
+
+### Task (`models/task.py`)
+
+| Column | Type | Notes |
+|--------|------|-------|
+| `id` | UUID (PK) | Auto-generated |
+| `company_id` | UUID (FK, nullable) | NULL = market task, set on acceptance |
+| `status` | Enum | market → planned → active → completed_success / completed_fail / cancelled |
+| `title` | String | Task description |
+| `required_prestige` | Float | Minimum prestige needed in ALL task domains |
+| `reward_funds_cents` | BigInteger | Payment on successful completion |
+| `reward_prestige_delta` | Float | Prestige gained per domain on success |
+| `skill_boost_pct` | Float | Employee skill rate increase on success |
+| `accepted_at` | DateTime (nullable) | When task was accepted from market |
+| `deadline` | DateTime (nullable) | Calculated at acceptance |
+| `completed_at` | DateTime (nullable) | When task finished |
+| `success` | Boolean (nullable) | True = on-time, False = late |
+| `progress_milestone_pct` | Float | Tracks progress milestones (e.g., 50%) |
+
+**Design choice**: `company_id` being nullable elegantly distinguishes market tasks (available for browsing) from accepted tasks (owned by the company).
+
+### TaskRequirement (`models/task.py`)
+
+| Column | Type | Notes |
+|--------|------|-------|
+| `task_id` | UUID (FK) | References Task |
+| `domain` | String | Which domain this requirement covers |
+| `required_qty` | Float | Total work units needed |
+| `completed_qty` | Float | Work units completed so far |
+
+**Design choice**: Multi-domain requirements make tasks a multi-dimensional optimization problem. A task might need work in 2-4 domains simultaneously.
+
+### TaskAssignment (`models/task.py`)
+
+| Column | Type | Notes |
+|--------|------|-------|
+| `task_id` | UUID (FK) | References Task |
+| `employee_id` | UUID (FK) | References Employee |
+| `assigned_at` | DateTime | When assigned |
+
+**Design choice**: Many-to-many junction table. An employee can work on multiple tasks (throughput splits), and a task can have multiple employees (parallel progress).
+
+### SimEvent (`models/event.py`)
+
+| Column | Type | Notes |
+|--------|------|-------|
+| `id` | UUID (PK) | Deterministic (uuid5) |
+| `company_id` | UUID (FK) | References Company |
+| `event_type` | String | task_completed / bankruptcy / task_half / horizon_end |
+| `scheduled_at` | DateTime | When event triggers |
+| `payload` | JSON | Event-specific data |
+| `dedupe_key` | String | Prevents duplicate events |
+| `consumed` | Boolean | True after processing |
+
+### LedgerEntry (`models/ledger.py`)
+
+| Column | Type | Notes |
+|--------|------|-------|
+| `id` | UUID (PK) | Auto-generated |
+| `company_id` | UUID (FK) | References Company |
+| `occurred_at` | DateTime | Transaction timestamp |
+| `category` | Enum | MONTHLY_PAYROLL / TASK_REWARD / TASK_FAIL_PENALTY / TASK_CANCEL_PENALTY |
+| `amount_cents` | BigInteger | Signed amount (negative = cost) |
+| `ref_type` | String (nullable) | Reference entity type |
+| `ref_id` | UUID (nullable) | Reference entity ID |
+
+**Design choice**: Immutable append-only ledger provides a complete financial audit trail. No entries are ever deleted or modified.
+
+### SimState (`models/sim_state.py`)
+
+| Column | Type | Notes |
+|--------|------|-------|
+| `company_id` | UUID (FK, PK) | References Company |
+| `sim_time` | DateTime | Current simulation clock |
+| `run_seed` | Integer | RNG seed for reproducibility |
+| `horizon_end` | DateTime | When simulation ends |
+| `replenish_counter` | Integer | Tracks market task replenishment |
+
+### Scratchpad (`models/scratchpad.py`)
+
+| Column | Type | Notes |
+|--------|------|-------|
+| `company_id` | UUID (FK) | References Company |
+| `content` | Text | Free-form agent notes |
+
+**Design choice**: Scratchpad survives LLM context truncation, giving the agent persistent memory across the full simulation.
+
+## Session Management (`session.py`)
+
+```python
+session_scope(factory) → context manager
+```
+
+- Creates a scoped session with automatic commit/rollback
+- Supports both SQLite (default) and PostgreSQL (via `DATABASE_URL`)
+- `init_db()` creates all tables from ORM metadata
+
+**Design choice**: Context manager pattern ensures every database operation is properly transacted, preventing partial state updates that would corrupt the simulation.