mirror of https://github.com/collinear-ai/yc-bench.git synced 2026-04-19 12:58:03 +00:00

AnandK27 ecd3d9e415 Add system design documentation for yc-bench

Comprehensive documentation covering all major subsystems:
simulation engine, data models, task system, prestige, finances,
employees, agent layer, CLI interface, configuration, and runner.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-08 13:42:41 -07:00

5 KiB

Raw Permalink Blame History

Simulation Engine

Location: src/yc_bench/core/

Design Choice: Discrete-Event Simulation

YC-Bench uses a discrete-event simulation (DES) model rather than a tick-based approach. This was chosen because:

Determinism: Events are processed in a fixed, reproducible order given the same seed
Efficiency: Time jumps between events rather than iterating every hour/day
Clarity: Each state change corresponds to a meaningful event, making the simulation auditable

Core Loop (`engine.py`)

The advance_time() function is the heart of the simulation:

advance_time(session, company_id, cfg) → AdvanceResult

Algorithm

Flush progress on all active tasks (convert elapsed business hours into completed work)
Apply prestige decay for elapsed days
Process payroll if crossing a month boundary (first business day)
Fetch next unconsumed event ordered by (scheduled_at, priority)
Dispatch to handler based on event type
Recalculate ETAs for affected tasks
Update sim_time to the event's timestamp
Return wake events to the agent

Why "Resume" Rather Than Auto-Advance?

The agent explicitly calls yc-bench sim resume to advance time. This design:

Gives the agent control over pacing (plan before advancing)
Creates a natural decision checkpoint between simulation steps
Allows multiple CLI queries before committing to advancing
If the agent stalls (N turns without resume), the loop forces one automatically

Event System (`events.py`)

Event Types (Priority Order)

Priority	Event Type	Trigger
1	`task_completed`	Task reaches 100% in all domain requirements
2	`bankruptcy`	Funds drop below zero after payroll
3	`task_half`	Task reaches 50% progress milestone
4	`horizon_end`	Simulation time limit reached

Design Choice: Fixed Priority Ordering

Events at the same timestamp are processed in strict priority order. This ensures:

Task completions (and their rewards) are processed before bankruptcy checks
A task finishing on the same day as payroll can save the company from bankruptcy
Deterministic behavior regardless of insertion order

Event Identity (Deterministic UUIDs)

Event IDs use uuid5 based on payload + timestamp + dedupe_key. This means:

Same world state produces identical event IDs
Deduplication is automatic (re-inserting same event is a no-op)
Full reproducibility across runs with same seed

Event Handlers (`handlers/`)

`task_complete.py`

Finalizes all domain progress to 100%
Success check: sim_time <= deadline
On success: add reward funds, add prestige per domain, boost employee skill rates, apply 1% salary bump
On failure (late): apply prestige penalty per domain (configurable multiplier)

`task_half.py`

Marks progress milestone reached
Informational event for agent awareness (no state changes beyond flag)

`bankruptcy.py`

Triggered when funds_cents < 0 after payroll
Terminates the simulation with bankruptcy outcome

`horizon_end.py`

Triggered at configured simulation end date
Terminates the simulation with final scoring

Progress Tracking (`progress.py`)

Effective Rate Calculation

effective_rate = base_rate_per_hour / num_active_tasks_for_this_employee

Design choice: Throughput splitting creates a resource allocation puzzle. An employee assigned to 3 tasks works at 1/3 speed on each. The agent must balance parallelism vs. focus.

Progress Flush

When advance_time() runs, it calculates work done since the last flush:

work = effective_rate × business_hours_elapsed
completed_qty += work  (capped at required_qty)

Business Time (`business_time.py`)

Design Choice: Business Hours Only

Work only happens during business hours (weekdays, configurable hours per day). This adds:

Realistic scheduling constraints
Weekend gaps that affect deadline calculations
A reason for the agent to think about calendar timing

ETA Solver (`eta.py`)

Completion Time

solve_task_completion_time():
  For each domain d:
    remaining[d] = required_qty[d] - completed_qty[d]
    rate[d] = sum(effective_rate for assigned employees with skill in d)
    time[d] = remaining[d] / rate[d]
  completion_time = max(time[d]) across all domains

Design Choice: Multi-Domain Bottleneck

A task completes when ALL domains finish. The slowest domain determines completion time. This creates interesting assignment puzzles where the agent must identify and address bottlenecks.

Halfway Time

Used for progress milestone events. Calculated as the weighted midpoint across domains.

Prestige Decay

apply_prestige_decay(session, company_id, days_elapsed, cfg):
  for each domain:
    prestige -= decay_per_day × days_elapsed
    prestige = max(prestige, prestige_min)  # floor at 1.0

Design choice: Decay prevents "set and forget" strategies. The agent must continuously work in domains to maintain access to high-tier tasks. Neglected domains revert to baseline.

5 KiB Raw Permalink Blame History Unescape Escape