yc-bench/system_design/04_prestige_system.md
2026-03-10 14:24:13 -07:00

134 lines
5.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Prestige System
**Location**: `src/yc_bench/db/models/company.py` (CompanyPrestige), `src/yc_bench/core/engine.py` (decay), `src/yc_bench/core/handlers/task_complete.py` (rewards/penalties)
## Overview
Prestige is YC-Bench's core progression mechanic. It controls access to higher-tier tasks (which offer better rewards) and decays over time, forcing continuous engagement.
## Design Choices
### Per-Domain Prestige (4 Independent Tracks)
```
research: ████████░░ (8.0)
inference: ██████░░░░ (6.0)
data_environment: ███░░░░░░░ (3.0)
training: █████░░░░░ (5.0)
```
**Why 4 domains?** This creates a 4-dimensional strategic space:
- The agent can't max all domains simultaneously (decay + limited employees)
- Specialization unlocks high-tier tasks in 1-2 domains
- Diversification provides resilience but slower progression
- Multi-domain tasks require balanced prestige across their domains
### Prestige Range: [1.0, 10.0]
| Level | Meaning |
|-------|---------|
| 1.0 | Minimum (starting/decayed) |
| 3.0-4.0 | Mid-tier tasks accessible |
| 7.0-8.0 | High-tier tasks accessible |
| 10.0 | Maximum (hard cap) |
**Design choice**: The 1-10 range is intuitive and provides enough granularity for meaningful gating tiers without over-complicating the system.
## Prestige Gain
On successful task completion (on-time):
```
for each domain in task.requirements:
company_prestige[domain] += task.reward_prestige_delta
company_prestige[domain] = min(prestige, 10.0) # cap
```
**Design choice**: Prestige gain is per-domain and tied to the task's requirements. Completing a research+inference task only boosts those two domains, not training or data_environment.
### Prestige Scaling of Rewards
```
actual_reward = base_reward × (1 + reward_prestige_scale × (prestige - 1))
```
Higher prestige in a domain means better financial returns from tasks in that domain. This creates a virtuous cycle: more prestige → more money → more capacity → more prestige.
## Prestige Loss
### Decay (Daily)
```
prestige -= decay_per_day × days_elapsed
prestige = max(prestige, 1.0) # floor
```
Default decay rate: -0.005/day. This is slow enough to not punish short gaps but fast enough that inactive domains eventually return to baseline.
**Design choice**: Continuous decay prevents "build once, exploit forever" strategies. The agent must continuously complete tasks in a domain to maintain access.
### Failure Penalty
On late task completion:
```
for each domain in task.requirements:
company_prestige[domain] -= task.reward_prestige_delta × fail_multiplier
company_prestige[domain] = max(prestige, 1.0)
```
Default `fail_multiplier`: 0.8. Late completion costs almost as much prestige as success would have gained.
### Cancel Penalty
On task cancellation:
```
for each domain in task.requirements:
company_prestige[domain] -= task.reward_prestige_delta × cancel_multiplier
company_prestige[domain] = max(prestige, 1.0)
```
Cancel multipliers vary by difficulty (higher on hard/nightmare).
## Prestige Gating
Tasks have a `required_prestige` field. At task acceptance:
```python
for domain in task.requirements:
if company_prestige[domain] < task.required_prestige:
reject() # must meet prestige in ALL task domains
```
**Design choice**: Per-domain gating means a task with `required_prestige=5.0` and requirements in research + training needs prestige >= 5.0 in BOTH research AND training. This prevents gaming.
### Stratified Market Tasks
The first 10 market tasks are always prestige-1 (accessible immediately). Higher prestige tasks are introduced with stratified distribution. This ensures:
- The agent always has something to work on initially
- Progression is visible (new tasks unlock as prestige grows)
- No dead-end states where the agent can't accept any task
## Strategic Implications
The prestige system creates several key strategic tensions:
1. **Specialize vs. Diversify**: Focus on 1-2 domains for deep access, or spread across all 4?
2. **Risk vs. Reward**: High-prestige tasks pay more but failure costs more prestige
3. **Maintenance vs. Growth**: Should the agent keep working in mastered domains (maintenance) or push new ones (growth)?
4. **Accept vs. Defer**: Taking a task you might fail risks prestige loss; waiting risks decay
These tensions make the benchmark more than just "do tasks fast" -- it tests genuine strategic reasoning.
## Interaction with Client Trust
Prestige and trust are complementary progression axes:
- **Prestige** gates which tasks you *can access* (required_prestige per domain)
- **Trust** determines how *profitable* those tasks are (reward scaling + work reduction)
- **Client specialties** bridge the two: clients with specialties in your high-prestige domains offer tasks you can complete quickly, building trust faster
- **Domain alignment** creates a strategic lever: picking clients whose specialties match your prestige strengths compounds both progression axes
See [11_client_trust.md](11_client_trust.md) for full trust mechanics.