mirror of
https://github.com/collinear-ai/yc-bench.git
synced 2026-04-25 17:10:47 +00:00
171 lines
6.5 KiB
Markdown
171 lines
6.5 KiB
Markdown
# Task System
|
||
|
||
**Location**: `src/yc_bench/cli/task_commands.py`, `src/yc_bench/core/eta.py`, `src/yc_bench/core/progress.py`
|
||
|
||
## Task Lifecycle
|
||
|
||
```
|
||
market ──accept──> planned ──dispatch──> active ──complete──> completed_success
|
||
│ │ completed_fail
|
||
│ │
|
||
└──cancel──> cancelled <──cancel──┘
|
||
```
|
||
|
||
### States
|
||
|
||
| Status | Meaning |
|
||
|--------|---------|
|
||
| `market` | Available for browsing, not yet accepted |
|
||
| `planned` | Accepted by company, employees can be assigned |
|
||
| `active` | Dispatched, work is progressing |
|
||
| `completed_success` | Finished on time |
|
||
| `completed_fail` | Finished late (past deadline) |
|
||
| `cancelled` | Abandoned by agent |
|
||
|
||
## Design Choices
|
||
|
||
### Two-Phase Activation (Accept → Dispatch)
|
||
|
||
Tasks go through `planned` before `active`. This separation:
|
||
|
||
1. **Allows pre-assignment**: Agent can assign employees before starting the clock
|
||
2. **Deadline starts at accept**: Creates urgency -- planning time counts against the deadline
|
||
3. **Forces commitment**: Accepting a task reserves it but the agent must still dispatch
|
||
|
||
### Deadline Calculation
|
||
|
||
```
|
||
deadline = accepted_at + max(required_qty[d] for all domains d) / deadline_qty_per_day
|
||
```
|
||
|
||
**Design choice**: Deadline is proportional to the largest single-domain requirement, not the sum. This means multi-domain tasks don't get proportionally more time -- they require parallel work.
|
||
|
||
### Prestige Gating at Accept Time
|
||
|
||
```python
|
||
def task_accept(task_id):
|
||
for domain in task.requirements:
|
||
if company_prestige[domain] < task.required_prestige:
|
||
reject("Insufficient prestige in {domain}")
|
||
```
|
||
|
||
**Design choice**: Prestige check is per-domain. A task requiring prestige 3.0 with requirements in `research` and `inference` needs prestige >= 3.0 in BOTH domains. This prevents gaming by maxing one domain.
|
||
|
||
### Trust Gating at Accept Time
|
||
|
||
~20% of tasks have a `required_trust` field. At acceptance, the agent's trust with the task's client must meet the threshold:
|
||
|
||
```python
|
||
if task.required_trust > 0 and task.client_id:
|
||
client_trust = get_trust(company_id, task.client_id)
|
||
if client_trust < task.required_trust:
|
||
reject("Insufficient trust with client")
|
||
```
|
||
|
||
**Design choice**: Trust gating is per-client, not global. High-trust tasks are the most valuable opportunities, gated behind relationship-building with specific clients. See [11_client_trust.md](11_client_trust.md) for full trust mechanics.
|
||
|
||
### Client Assignment and Reward Scaling
|
||
|
||
Each task belongs to a specific client. At acceptance:
|
||
|
||
1. **Reward scaling**: `actual_reward = listed_reward × trust_multiplier` (50% at trust 0, scaling up with trust and client tier)
|
||
2. **Work reduction**: `required_qty *= (1 - trust_work_reduction_max × trust/trust_max)` (up to 40% less work at max trust)
|
||
3. **Replacement generation**: A new market task replaces the accepted one, biased toward the same client's specialty domains
|
||
|
||
### Cancel Penalties
|
||
|
||
Cancelling an active task incurs:
|
||
- Prestige penalty: `reward_prestige_delta × cancel_multiplier` (configurable per difficulty)
|
||
- No financial penalty (just lost opportunity)
|
||
|
||
**Design choice**: Cancel penalties prevent the strategy of accepting everything and dropping what's inconvenient. Higher difficulties increase the cancel multiplier.
|
||
|
||
## Employee Assignment
|
||
|
||
### Assignment Rules
|
||
|
||
- Employees can only be assigned to `planned` or `active` tasks
|
||
- An employee can work on multiple tasks simultaneously (throughput splits)
|
||
- Multiple employees can work on the same task (parallel progress)
|
||
|
||
### Throughput Splitting
|
||
|
||
```
|
||
effective_rate = base_rate_per_hour / num_active_tasks
|
||
```
|
||
|
||
**Design choice**: Linear throughput splitting creates a fundamental trade-off:
|
||
- **Focus**: 1 employee on 1 task = full speed
|
||
- **Parallel**: 1 employee on 3 tasks = 1/3 speed each
|
||
- The agent must decide between fast completion of few tasks vs. slow progress on many
|
||
|
||
## Progress Tracking (`progress.py`)
|
||
|
||
### How Work Gets Done
|
||
|
||
Progress is calculated lazily during `advance_time()`:
|
||
|
||
```python
|
||
for each active task:
|
||
for each assigned employee:
|
||
for each domain in task requirements:
|
||
work = employee.skill_rate[domain] / num_active_tasks × business_hours
|
||
requirement.completed_qty += work
|
||
requirement.completed_qty = min(completed_qty, required_qty)
|
||
```
|
||
|
||
### Multi-Domain Completion
|
||
|
||
A task is complete when ALL domain requirements reach `completed_qty >= required_qty`. The slowest domain is the bottleneck.
|
||
|
||
**Design choice**: This creates interesting optimization puzzles. If a task needs 100 units of research and 50 units of training, the agent should allocate more research-skilled employees to balance completion times.
|
||
|
||
## ETA Solver (`eta.py`)
|
||
|
||
### Completion Time Calculation
|
||
|
||
```python
|
||
def solve_task_completion_time(task, assignments, sim_time):
|
||
for each domain d:
|
||
remaining = required_qty[d] - completed_qty[d]
|
||
rate = sum(effective_rate[emp][d] for emp in assignments)
|
||
if rate == 0:
|
||
return infinity # no one can work on this domain
|
||
hours_needed[d] = remaining / rate
|
||
|
||
max_hours = max(hours_needed.values())
|
||
return sim_time + max_hours (in business hours)
|
||
```
|
||
|
||
### Halfway Time Calculation
|
||
|
||
Used for milestone events. Finds the time when weighted average across domains reaches 50%.
|
||
|
||
### When ETAs Are Recalculated
|
||
|
||
- Task dispatched (new active task)
|
||
- Employee assigned/unassigned
|
||
- Task completed (frees employee throughput for other tasks)
|
||
- Task cancelled (same)
|
||
|
||
**Design choice**: Dynamic ETA recalculation ensures events are always accurate. When an employee is reassigned, all affected tasks get new completion projections.
|
||
|
||
## Market Task Generation
|
||
|
||
See [09_configuration.md](09_configuration.md) for details on how market tasks are generated with stratified prestige distribution and randomized requirements.
|
||
|
||
### Browsing and Filtering
|
||
|
||
The `market browse` command supports:
|
||
- Domain filter
|
||
- Prestige range filter
|
||
- Reward range filter
|
||
- Pagination (offset/limit)
|
||
|
||
All output is JSON for agent consumption.
|
||
|
||
### Sim Resume Blocking
|
||
|
||
`yc-bench sim resume` is **blocked** when there are zero active tasks, returning `{"ok": false}` instead of advancing time. This prevents catastrophic payroll drain when the agent has no work in progress. The agent loop filters blocked responses and treats them as no-ops.
|
||
|
||
The auto-advance mechanism (which forces `sim resume` after N consecutive turns without one) also checks for active tasks before advancing.
|