mirror of https://github.com/collinear-ai/yc-bench.git synced 2026-04-19 12:58:03 +00:00

Muyu He ef7c64b5cb Updated design mds

2026-03-19 18:39:57 -07:00

5.9 KiB

Raw Permalink Blame History

Employee Model

Location: src/yc_bench/db/models/employee.py, src/yc_bench/services/generate_employees.py, src/yc_bench/core/progress.py

Overview

Employees are the company's productive resources. Each has a tier, salary, and hidden per-domain skill rates. The agent must figure out who is good at what through observation and assign them optimally.

Design Choices

Hidden Skill Rates (Information Asymmetry)

The agent sees:

Employee name, tier (junior/mid/senior), salary
Which tasks they're currently assigned to

The agent does NOT see:

Per-domain skill rates (rate_domain_per_hour)
Actual work output per hour

Why hidden? This is a core benchmark design decision:

Tests inference ability: The agent must infer strengths from task completion patterns
Mirrors reality: Real managers don't have exact productivity metrics for every skill dimension
Creates learning opportunity: Early task assignments serve as "probes" to discover team capabilities
Rewards memory: Agents that remember past performance can make better future assignments

Tier System

Tier	Typical Rate Range	Salary Range
junior	Low	Low
mid	Medium	Medium
senior	High	High

Design choice: Tiers provide a rough signal. Seniors are generally better but not always in every domain. A junior might excel in one domain while a senior is mediocre there. The tier-salary correlation creates a cost-benefit trade-off.

Per-Domain Skill Rates

Each employee has 4 skill rates (one per domain):

class EmployeeSkillRate:
    domain: str          # research, inference, data_environment, training
    rate_domain_per_hour: float  # work units produced per business hour

Rates are generated by uniform sampling within tier-specific bounds:

# Each domain rate is independently sampled: uniform(rate_min, rate_max)
# Junior: [1.0, 4.0], Mid: [4.0, 7.0], Senior: [7.0, 10.0]
# An employee can have near-zero in some domains and high in others.

The team composition follows a fixed ratio: 5 junior, 3 mid, 2 senior (for a 10-person team), shuffled randomly. Employees and clients use a fixed world seed (seed=1) so the same team appears across all run seeds — only task generation varies.

Skill rates are capped at skill_rate_max (default 10.0) even after task-completion boosts.

Design choice: Uniform per-domain sampling creates natural specialization without complex distribution mechanics. The fixed world seed ensures consistent employee/client composition for fair cross-seed comparison.

Throughput Splitting

When an employee works on multiple active tasks simultaneously:

effective_rate = base_rate / num_active_tasks

Design choice: Linear splitting (not diminishing returns or context-switching penalties) was chosen for simplicity and predictability. The agent can reason about it without hidden costs.

Example

Employee Alice has research_rate = 2.0/hr:

Assigned to 1 task: contributes 2.0 research units/hour
Assigned to 3 tasks: contributes 0.67 research units/hour to each

Implication for Strategy

The agent faces a fundamental trade-off:

Focused assignment: 1 employee → 1 task = fastest completion but no parallelism
Spread assignment: 1 employee → N tasks = slower per task but progress on multiple fronts
Optimal: Match the strategy to deadline pressure and task urgency

Skill Growth

On successful task completion, assigned employees get a skill boost:

for each assigned employee:
    for each domain in task.requirements:
        skill_rate[domain] *= (1 + task.skill_boost_pct / 100)

Design choice: Skill growth compounds over time. Early investments in employee development pay off later through faster task completion. This creates a "training vs. exploiting" tension.

Salary Bumps (Hidden Cost of Growth)

Each task completion also increases salaries:

for each assigned employee:
    salary_cents *= 1.01  # 1% increase

Design choice: Salary bumps mean that experienced employees cost more. The agent can't infinitely scale employee productivity without also scaling costs. After many completions, payroll may become a significant burden.

Employee Generation (`generate_employees.py`)

Process

Generate 10 employees per company (configurable)
Assign tiers based on configured distribution (e.g., 30% junior, 40% mid, 30% senior)
For each employee, generate 4 skill rates from per-tier distributions
Set salary based on tier bracket

Distribution Types

Skill rates are drawn from configurable distributions:

Triangular: min/mode/max (default -- creates realistic bell-curve-like distributions)
Beta: alpha/beta parameters (useful for skewed distributions)
Normal: mean/std (truncated to positive values)
Uniform: low/high
Constant: fixed value

Design choice: Configurable distributions allow difficulty presets to create different workforce profiles. Tutorial mode might use tight distributions (predictable employees), while nightmare mode uses wide distributions (unpredictable).

Employee Visibility to Agent

The employee list CLI command returns:

{
  "employees": [
    {
      "id": "uuid",
      "name": "Alice Chen",
      "tier": "senior",
      "salary": "$8,000/mo",
      "active_tasks": 2
    }
  ]
}

Note: no skill rates, no per-domain breakdown, no historical performance. The agent must build this knowledge through experience.

Strategic Considerations

Discovery phase: Early on, assign different employees to different domain tasks to learn strengths
Specialization: Once strengths are known, match employees to their best domains
Load balancing: Avoid overloading one employee (throughput splitting penalty)
Growth investment: Assign employees to tasks in domains where they need improvement
Cost awareness: Track which employees have had many salary bumps

5.9 KiB Raw Permalink Blame History