yc-bench/system_design/03_task_system.md
2026-03-10 14:24:13 -07:00

6.5 KiB
Raw Blame History

Task System

Location: src/yc_bench/cli/task_commands.py, src/yc_bench/core/eta.py, src/yc_bench/core/progress.py

Task Lifecycle

market ──accept──> planned ──dispatch──> active ──complete──> completed_success
                      │                    │                  completed_fail
                      │                    │
                      └──cancel──> cancelled <──cancel──┘

States

Status Meaning
market Available for browsing, not yet accepted
planned Accepted by company, employees can be assigned
active Dispatched, work is progressing
completed_success Finished on time
completed_fail Finished late (past deadline)
cancelled Abandoned by agent

Design Choices

Two-Phase Activation (Accept → Dispatch)

Tasks go through planned before active. This separation:

  1. Allows pre-assignment: Agent can assign employees before starting the clock
  2. Deadline starts at accept: Creates urgency -- planning time counts against the deadline
  3. Forces commitment: Accepting a task reserves it but the agent must still dispatch

Deadline Calculation

deadline = accepted_at + max(required_qty[d] for all domains d) / deadline_qty_per_day

Design choice: Deadline is proportional to the largest single-domain requirement, not the sum. This means multi-domain tasks don't get proportionally more time -- they require parallel work.

Prestige Gating at Accept Time

def task_accept(task_id):
    for domain in task.requirements:
        if company_prestige[domain] < task.required_prestige:
            reject("Insufficient prestige in {domain}")

Design choice: Prestige check is per-domain. A task requiring prestige 3.0 with requirements in research and inference needs prestige >= 3.0 in BOTH domains. This prevents gaming by maxing one domain.

Trust Gating at Accept Time

~20% of tasks have a required_trust field. At acceptance, the agent's trust with the task's client must meet the threshold:

if task.required_trust > 0 and task.client_id:
    client_trust = get_trust(company_id, task.client_id)
    if client_trust < task.required_trust:
        reject("Insufficient trust with client")

Design choice: Trust gating is per-client, not global. High-trust tasks are the most valuable opportunities, gated behind relationship-building with specific clients. See 11_client_trust.md for full trust mechanics.

Client Assignment and Reward Scaling

Each task belongs to a specific client. At acceptance:

  1. Reward scaling: actual_reward = listed_reward × trust_multiplier (50% at trust 0, scaling up with trust and client tier)
  2. Work reduction: required_qty *= (1 - trust_work_reduction_max × trust/trust_max) (up to 40% less work at max trust)
  3. Replacement generation: A new market task replaces the accepted one, biased toward the same client's specialty domains

Cancel Penalties

Cancelling an active task incurs:

  • Prestige penalty: reward_prestige_delta × cancel_multiplier (configurable per difficulty)
  • No financial penalty (just lost opportunity)

Design choice: Cancel penalties prevent the strategy of accepting everything and dropping what's inconvenient. Higher difficulties increase the cancel multiplier.

Employee Assignment

Assignment Rules

  • Employees can only be assigned to planned or active tasks
  • An employee can work on multiple tasks simultaneously (throughput splits)
  • Multiple employees can work on the same task (parallel progress)

Throughput Splitting

effective_rate = base_rate_per_hour / num_active_tasks

Design choice: Linear throughput splitting creates a fundamental trade-off:

  • Focus: 1 employee on 1 task = full speed
  • Parallel: 1 employee on 3 tasks = 1/3 speed each
  • The agent must decide between fast completion of few tasks vs. slow progress on many

Progress Tracking (progress.py)

How Work Gets Done

Progress is calculated lazily during advance_time():

for each active task:
    for each assigned employee:
        for each domain in task requirements:
            work = employee.skill_rate[domain] / num_active_tasks × business_hours
            requirement.completed_qty += work
            requirement.completed_qty = min(completed_qty, required_qty)

Multi-Domain Completion

A task is complete when ALL domain requirements reach completed_qty >= required_qty. The slowest domain is the bottleneck.

Design choice: This creates interesting optimization puzzles. If a task needs 100 units of research and 50 units of training, the agent should allocate more research-skilled employees to balance completion times.

ETA Solver (eta.py)

Completion Time Calculation

def solve_task_completion_time(task, assignments, sim_time):
    for each domain d:
        remaining = required_qty[d] - completed_qty[d]
        rate = sum(effective_rate[emp][d] for emp in assignments)
        if rate == 0:
            return infinity  # no one can work on this domain
        hours_needed[d] = remaining / rate

    max_hours = max(hours_needed.values())
    return sim_time + max_hours (in business hours)

Halfway Time Calculation

Used for milestone events. Finds the time when weighted average across domains reaches 50%.

When ETAs Are Recalculated

  • Task dispatched (new active task)
  • Employee assigned/unassigned
  • Task completed (frees employee throughput for other tasks)
  • Task cancelled (same)

Design choice: Dynamic ETA recalculation ensures events are always accurate. When an employee is reassigned, all affected tasks get new completion projections.

Market Task Generation

See 09_configuration.md for details on how market tasks are generated with stratified prestige distribution and randomized requirements.

Browsing and Filtering

The market browse command supports:

  • Domain filter
  • Prestige range filter
  • Reward range filter
  • Pagination (offset/limit)

All output is JSON for agent consumption.

Sim Resume Blocking

yc-bench sim resume is blocked when there are zero active tasks, returning {"ok": false} instead of advancing time. This prevents catastrophic payroll drain when the agent has no work in progress. The agent loop filters blocked responses and treats them as no-ops.

The auto-advance mechanism (which forces sim resume after N consecutive turns without one) also checks for active tasks before advancing.