yc-bench/system_design/09_configuration.md

# Configuration System

**Location**: `src/yc_bench/config/`

## Overview

The configuration system uses Pydantic models validated from TOML preset files. It controls every aspect of the simulation: world generation parameters, difficulty tuning, agent behavior, and distribution specifications.

## Design Choices

### Pydantic Schema (`schema.py`)

The configuration hierarchy:

```
ExperimentConfig
├── AgentConfig          # LLM model, tools, retry settings
├── LoopConfig           # Turn budget, auto-resume threshold
├── SimConfig            # Simulation parameters
└── WorldConfig          # World generation parameters
    ├── CompanyConfig     # Initial funds, starting prestige
    ├── EmployeeConfig    # Team size, tier distribution, salary ranges
    ├── TaskConfig        # Task count, domain requirements, deadlines
    └── PrestigeConfig    # Decay rate, penalty multipliers, scaling
```

**Why Pydantic?**
- Type validation at load time (catch config errors early)
- Default values with optional overrides
- Discriminated unions for distribution specs
- Clear documentation through type annotations
- Serialization to/from TOML/JSON

### TOML Preset Files (`presets/`)

```toml
# medium.toml
[world]
initial_funds_cents = 500_000_00

[world.prestige]
decay_per_day = 0.005
penalty_fail_multiplier = 0.8
penalty_cancel_multiplier = 1.0

[world.tasks]
count = 200
deadline_qty_per_day = 11.0

[world.tasks.reward_funds]
type = "triangular"
min = 5000_00
mode = 15000_00
max = 50000_00
```

**Why TOML?** Human-readable, supports comments, natural hierarchy via sections, widely supported in Python. Better than JSON for config files (comments), simpler than YAML (fewer gotchas).

### Preset Hierarchy

| Preset | Focus | Key Characteristics |
|--------|-------|-------------------|
| `default.toml` | Base | All defaults; other presets override selectively |
| `tutorial.toml` | Learning | Relaxed deadlines, prestige-1 tasks only, high funds |
| `easy.toml` | Casual | Relaxed deadlines, flat prestige requirements |
| `medium.toml` | Standard | Prestige climbing, 2-domain tasks, 9-day deadlines |
| `hard.toml` | Challenge | Prestige gating active, 7-day deadlines, 1.5x cancel penalty |
| `nightmare.toml` | Extreme | Razor-thin margins, 6-day deadlines, 2x penalties |

**Design choice**: Preset-based difficulty rather than a single "difficulty slider" allows fine-grained control. Each preset can tune dozens of independent parameters.

### Config Loading (`loader.py`)

```python
def load_config(preset_name: str) -> ExperimentConfig:
    base = load_toml("default.toml")
    overlay = load_toml(f"{preset_name}.toml")
    merged = deep_merge(base, overlay)
    return ExperimentConfig(**merged)
```

**Design choice**: Config inheritance via deep merge. Presets only specify what differs from default, keeping preset files concise and maintainable.

## Distribution Specifications (`sampling.py`)

### The DistSpec System

Many world generation parameters use statistical distributions rather than fixed values:

```python
class DistSpec(BaseModel):
    """Discriminated union of distribution types."""
    type: Literal["triangular", "beta", "normal", "uniform", "constant"]
    # Parameters vary by type
```

**Supported distributions:**

| Type | Parameters | Use Case |
|------|-----------|----------|
| `triangular` | min, mode, max | Task rewards, skill rates (natural asymmetric bell curve) |
| `beta` | alpha, beta, scale | Prestige requirements (skewed toward low values) |
| `normal` | mean, std | Symmetric variation around a target |
| `uniform` | low, high | Equal probability across range |
| `constant` | value | Fixed value (no randomness) |

**Why discriminated unions?** Pydantic validates the correct parameters for each distribution type at load time. Invalid combinations (e.g., triangular with alpha parameter) are caught before the simulation runs.

### Usage Example

```toml
[world.tasks.reward_funds]
type = "triangular"
min = 5000_00
mode = 15000_00
max = 50000_00

[world.employees.junior_rate]
type = "beta"
alpha = 2.0
beta = 5.0
scale = 3.0
```

## World Generation

### Seeding (`services/seed_world.py`)

```python
def seed_world_transactional(session, cfg, seed):
    rng = create_rng(seed)
    company = create_company(session, cfg.world.company)
    employees = generate_employees(session, company, cfg.world.employees, rng)
    tasks = generate_tasks(session, cfg.world.tasks, rng)
    sim_state = create_sim_state(session, company, cfg.sim, seed)
```

**Design choice**: Single-transaction world seeding ensures atomic creation. Either the entire world is created or nothing is -- no partial states.

### Employee Generation (`services/generate_employees.py`)

1. Generate N employees (default 10)
2. Assign tiers from configured distribution (e.g., 30/40/30 junior/mid/senior)
3. For each employee, sample 4 skill rates from per-tier distributions
4. Set salary based on tier range

### Task Generation (`services/generate_tasks.py`)

1. Generate M tasks (default 200+)
2. First 10 tasks are always prestige-1 (guaranteed accessible)
3. Remaining tasks have stratified prestige requirements
4. Each task gets 2-4 domain requirements sampled from distributions
5. Rewards scale with prestige and task size

**Design choice**: Stratified generation ensures:
- The agent always has starting tasks (prestige-1 guaranteed)
- Tasks span the full prestige range (progression is possible)
- No prestige "dead zones" where no tasks exist

### RNG Management (`services/rng.py`)

```python
def create_rng(seed: int) -> numpy.random.Generator:
    return numpy.random.default_rng(seed)
```

**Design choice**: Centralized RNG with explicit seed ensures full reproducibility. Same seed → same world → same event sequence (given same agent actions).

## Key Configuration Parameters

### Financial Tuning

| Parameter | Default | Effect |
|-----------|---------|--------|
| `initial_funds_cents` | 500,000 | Starting capital |
| `reward_prestige_scale` | 0.15 | How much prestige amplifies rewards |
| `salary_bump_pct` | 1.0 | Per-completion salary increase |

### Prestige Tuning

| Parameter | Default | Effect |
|-----------|---------|--------|
| `prestige_decay_per_day` | 0.005 | Daily prestige loss |
| `penalty_fail_multiplier` | 0.8 | Prestige cost of late completion |
| `penalty_cancel_multiplier` | 1.0 | Prestige cost of cancellation |
| `prestige_min` | 1.0 | Floor value |
| `prestige_max` | 10.0 | Ceiling value |

### Task Tuning

| Parameter | Default | Effect |
|-----------|---------|--------|
| `deadline_qty_per_day` | 11.0 | Deadline generosity |
| `num_domains_per_task` | 2-4 | Multi-domain complexity |
| `progress_milestone_pct` | 50 | When to fire halfway event |

### Agent Tuning

| Parameter | Default | Effect |
|-----------|---------|--------|
| `max_turns` | 500 | Hard turn limit |
| `max_turns_without_resume` | 5 | Auto-resume threshold |
| `history_truncation` | 50 | Turns kept in context |