yc-bench/system_design/03_task_system.md
2026-03-10 14:24:13 -07:00

171 lines
6.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Task System
**Location**: `src/yc_bench/cli/task_commands.py`, `src/yc_bench/core/eta.py`, `src/yc_bench/core/progress.py`
## Task Lifecycle
```
market ──accept──> planned ──dispatch──> active ──complete──> completed_success
│ │ completed_fail
│ │
└──cancel──> cancelled <──cancel──┘
```
### States
| Status | Meaning |
|--------|---------|
| `market` | Available for browsing, not yet accepted |
| `planned` | Accepted by company, employees can be assigned |
| `active` | Dispatched, work is progressing |
| `completed_success` | Finished on time |
| `completed_fail` | Finished late (past deadline) |
| `cancelled` | Abandoned by agent |
## Design Choices
### Two-Phase Activation (Accept → Dispatch)
Tasks go through `planned` before `active`. This separation:
1. **Allows pre-assignment**: Agent can assign employees before starting the clock
2. **Deadline starts at accept**: Creates urgency -- planning time counts against the deadline
3. **Forces commitment**: Accepting a task reserves it but the agent must still dispatch
### Deadline Calculation
```
deadline = accepted_at + max(required_qty[d] for all domains d) / deadline_qty_per_day
```
**Design choice**: Deadline is proportional to the largest single-domain requirement, not the sum. This means multi-domain tasks don't get proportionally more time -- they require parallel work.
### Prestige Gating at Accept Time
```python
def task_accept(task_id):
for domain in task.requirements:
if company_prestige[domain] < task.required_prestige:
reject("Insufficient prestige in {domain}")
```
**Design choice**: Prestige check is per-domain. A task requiring prestige 3.0 with requirements in `research` and `inference` needs prestige >= 3.0 in BOTH domains. This prevents gaming by maxing one domain.
### Trust Gating at Accept Time
~20% of tasks have a `required_trust` field. At acceptance, the agent's trust with the task's client must meet the threshold:
```python
if task.required_trust > 0 and task.client_id:
client_trust = get_trust(company_id, task.client_id)
if client_trust < task.required_trust:
reject("Insufficient trust with client")
```
**Design choice**: Trust gating is per-client, not global. High-trust tasks are the most valuable opportunities, gated behind relationship-building with specific clients. See [11_client_trust.md](11_client_trust.md) for full trust mechanics.
### Client Assignment and Reward Scaling
Each task belongs to a specific client. At acceptance:
1. **Reward scaling**: `actual_reward = listed_reward × trust_multiplier` (50% at trust 0, scaling up with trust and client tier)
2. **Work reduction**: `required_qty *= (1 - trust_work_reduction_max × trust/trust_max)` (up to 40% less work at max trust)
3. **Replacement generation**: A new market task replaces the accepted one, biased toward the same client's specialty domains
### Cancel Penalties
Cancelling an active task incurs:
- Prestige penalty: `reward_prestige_delta × cancel_multiplier` (configurable per difficulty)
- No financial penalty (just lost opportunity)
**Design choice**: Cancel penalties prevent the strategy of accepting everything and dropping what's inconvenient. Higher difficulties increase the cancel multiplier.
## Employee Assignment
### Assignment Rules
- Employees can only be assigned to `planned` or `active` tasks
- An employee can work on multiple tasks simultaneously (throughput splits)
- Multiple employees can work on the same task (parallel progress)
### Throughput Splitting
```
effective_rate = base_rate_per_hour / num_active_tasks
```
**Design choice**: Linear throughput splitting creates a fundamental trade-off:
- **Focus**: 1 employee on 1 task = full speed
- **Parallel**: 1 employee on 3 tasks = 1/3 speed each
- The agent must decide between fast completion of few tasks vs. slow progress on many
## Progress Tracking (`progress.py`)
### How Work Gets Done
Progress is calculated lazily during `advance_time()`:
```python
for each active task:
for each assigned employee:
for each domain in task requirements:
work = employee.skill_rate[domain] / num_active_tasks × business_hours
requirement.completed_qty += work
requirement.completed_qty = min(completed_qty, required_qty)
```
### Multi-Domain Completion
A task is complete when ALL domain requirements reach `completed_qty >= required_qty`. The slowest domain is the bottleneck.
**Design choice**: This creates interesting optimization puzzles. If a task needs 100 units of research and 50 units of training, the agent should allocate more research-skilled employees to balance completion times.
## ETA Solver (`eta.py`)
### Completion Time Calculation
```python
def solve_task_completion_time(task, assignments, sim_time):
for each domain d:
remaining = required_qty[d] - completed_qty[d]
rate = sum(effective_rate[emp][d] for emp in assignments)
if rate == 0:
return infinity # no one can work on this domain
hours_needed[d] = remaining / rate
max_hours = max(hours_needed.values())
return sim_time + max_hours (in business hours)
```
### Halfway Time Calculation
Used for milestone events. Finds the time when weighted average across domains reaches 50%.
### When ETAs Are Recalculated
- Task dispatched (new active task)
- Employee assigned/unassigned
- Task completed (frees employee throughput for other tasks)
- Task cancelled (same)
**Design choice**: Dynamic ETA recalculation ensures events are always accurate. When an employee is reassigned, all affected tasks get new completion projections.
## Market Task Generation
See [09_configuration.md](09_configuration.md) for details on how market tasks are generated with stratified prestige distribution and randomized requirements.
### Browsing and Filtering
The `market browse` command supports:
- Domain filter
- Prestige range filter
- Reward range filter
- Pagination (offset/limit)
All output is JSON for agent consumption.
### Sim Resume Blocking
`yc-bench sim resume` is **blocked** when there are zero active tasks, returning `{"ok": false}` instead of advancing time. This prevents catastrophic payroll drain when the agent has no work in progress. The agent loop filters blocked responses and treats them as no-ops.
The auto-advance mechanism (which forces `sim resume` after N consecutive turns without one) also checks for active tasks before advancing.