mirror of https://github.com/collinear-ai/yc-bench.git synced 2026-04-19 12:58:03 +00:00

AnandK27 ecd3d9e415 Add system design documentation for yc-bench

Comprehensive documentation covering all major subsystems:
simulation engine, data models, task system, prestige, finances,
employees, agent layer, CLI interface, configuration, and runner.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-08 13:42:41 -07:00

4.4 KiB

Raw Blame History

Prestige System

Location: src/yc_bench/db/models/company.py (CompanyPrestige), src/yc_bench/core/engine.py (decay), src/yc_bench/core/handlers/task_complete.py (rewards/penalties)

Overview

Prestige is YC-Bench's core progression mechanic. It controls access to higher-tier tasks (which offer better rewards) and decays over time, forcing continuous engagement.

Design Choices

Per-Domain Prestige (4 Independent Tracks)

research:          ████████░░  (8.0)
inference:         ██████░░░░  (6.0)
data_environment:  ███░░░░░░░  (3.0)
training:          █████░░░░░  (5.0)

Why 4 domains? This creates a 4-dimensional strategic space:

The agent can't max all domains simultaneously (decay + limited employees)
Specialization unlocks high-tier tasks in 1-2 domains
Diversification provides resilience but slower progression
Multi-domain tasks require balanced prestige across their domains

Prestige Range: [1.0, 10.0]

Level	Meaning
1.0	Minimum (starting/decayed)
3.0-4.0	Mid-tier tasks accessible
7.0-8.0	High-tier tasks accessible
10.0	Maximum (hard cap)

Design choice: The 1-10 range is intuitive and provides enough granularity for meaningful gating tiers without over-complicating the system.

Prestige Gain

On successful task completion (on-time):

for each domain in task.requirements:
    company_prestige[domain] += task.reward_prestige_delta
    company_prestige[domain] = min(prestige, 10.0)  # cap

Design choice: Prestige gain is per-domain and tied to the task's requirements. Completing a research+inference task only boosts those two domains, not training or data_environment.

Prestige Scaling of Rewards

actual_reward = base_reward × (1 + reward_prestige_scale × (prestige - 1))

Higher prestige in a domain means better financial returns from tasks in that domain. This creates a virtuous cycle: more prestige → more money → more capacity → more prestige.

Prestige Loss

Decay (Daily)

prestige -= decay_per_day × days_elapsed
prestige = max(prestige, 1.0)  # floor

Default decay rate: -0.005/day. This is slow enough to not punish short gaps but fast enough that inactive domains eventually return to baseline.

Design choice: Continuous decay prevents "build once, exploit forever" strategies. The agent must continuously complete tasks in a domain to maintain access.

Failure Penalty

On late task completion:

for each domain in task.requirements:
    company_prestige[domain] -= task.reward_prestige_delta × fail_multiplier
    company_prestige[domain] = max(prestige, 1.0)

Default fail_multiplier: 0.8. Late completion costs almost as much prestige as success would have gained.

Cancel Penalty

On task cancellation:

for each domain in task.requirements:
    company_prestige[domain] -= task.reward_prestige_delta × cancel_multiplier
    company_prestige[domain] = max(prestige, 1.0)

Cancel multipliers vary by difficulty (higher on hard/nightmare).

Prestige Gating

Tasks have a required_prestige field. At task acceptance:

for domain in task.requirements:
    if company_prestige[domain] < task.required_prestige:
        reject()  # must meet prestige in ALL task domains

Design choice: Per-domain gating means a task with required_prestige=5.0 and requirements in research + training needs prestige >= 5.0 in BOTH research AND training. This prevents gaming.

Stratified Market Tasks

The first 10 market tasks are always prestige-1 (accessible immediately). Higher prestige tasks are introduced with stratified distribution. This ensures:

The agent always has something to work on initially
Progression is visible (new tasks unlock as prestige grows)
No dead-end states where the agent can't accept any task

Strategic Implications

The prestige system creates several key strategic tensions:

Specialize vs. Diversify: Focus on 1-2 domains for deep access, or spread across all 4?
Risk vs. Reward: High-prestige tasks pay more but failure costs more prestige
Maintenance vs. Growth: Should the agent keep working in mastered domains (maintenance) or push new ones (growth)?
Accept vs. Defer: Taking a task you might fail risks prestige loss; waiting risks decay

These tensions make the benchmark more than just "do tasks fast" -- it tests genuine strategic reasoning.

4.4 KiB Raw Blame History Unescape Escape