mirror of
https://github.com/collinear-ai/yc-bench.git
synced 2026-04-19 12:58:03 +00:00
Comprehensive documentation covering all major subsystems: simulation engine, data models, task system, prestige, finances, employees, agent layer, CLI interface, configuration, and runner. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
203 lines
6.9 KiB
Markdown
203 lines
6.9 KiB
Markdown
# Configuration System
|
|
|
|
**Location**: `src/yc_bench/config/`
|
|
|
|
## Overview
|
|
|
|
The configuration system uses Pydantic models validated from TOML preset files. It controls every aspect of the simulation: world generation parameters, difficulty tuning, agent behavior, and distribution specifications.
|
|
|
|
## Design Choices
|
|
|
|
### Pydantic Schema (`schema.py`)
|
|
|
|
The configuration hierarchy:
|
|
|
|
```
|
|
ExperimentConfig
|
|
├── AgentConfig # LLM model, tools, retry settings
|
|
├── LoopConfig # Turn budget, auto-resume threshold
|
|
├── SimConfig # Simulation parameters
|
|
└── WorldConfig # World generation parameters
|
|
├── CompanyConfig # Initial funds, starting prestige
|
|
├── EmployeeConfig # Team size, tier distribution, salary ranges
|
|
├── TaskConfig # Task count, domain requirements, deadlines
|
|
└── PrestigeConfig # Decay rate, penalty multipliers, scaling
|
|
```
|
|
|
|
**Why Pydantic?**
|
|
- Type validation at load time (catch config errors early)
|
|
- Default values with optional overrides
|
|
- Discriminated unions for distribution specs
|
|
- Clear documentation through type annotations
|
|
- Serialization to/from TOML/JSON
|
|
|
|
### TOML Preset Files (`presets/`)
|
|
|
|
```toml
|
|
# medium.toml
|
|
[world]
|
|
initial_funds_cents = 500_000_00
|
|
|
|
[world.prestige]
|
|
decay_per_day = 0.005
|
|
penalty_fail_multiplier = 0.8
|
|
penalty_cancel_multiplier = 1.0
|
|
|
|
[world.tasks]
|
|
count = 200
|
|
deadline_qty_per_day = 11.0
|
|
|
|
[world.tasks.reward_funds]
|
|
type = "triangular"
|
|
min = 5000_00
|
|
mode = 15000_00
|
|
max = 50000_00
|
|
```
|
|
|
|
**Why TOML?** Human-readable, supports comments, natural hierarchy via sections, widely supported in Python. Better than JSON for config files (comments), simpler than YAML (fewer gotchas).
|
|
|
|
### Preset Hierarchy
|
|
|
|
| Preset | Focus | Key Characteristics |
|
|
|--------|-------|-------------------|
|
|
| `default.toml` | Base | All defaults; other presets override selectively |
|
|
| `tutorial.toml` | Learning | Relaxed deadlines, prestige-1 tasks only, high funds |
|
|
| `easy.toml` | Casual | Relaxed deadlines, flat prestige requirements |
|
|
| `medium.toml` | Standard | Prestige climbing, 2-domain tasks, 9-day deadlines |
|
|
| `hard.toml` | Challenge | Prestige gating active, 7-day deadlines, 1.5x cancel penalty |
|
|
| `nightmare.toml` | Extreme | Razor-thin margins, 6-day deadlines, 2x penalties |
|
|
|
|
**Design choice**: Preset-based difficulty rather than a single "difficulty slider" allows fine-grained control. Each preset can tune dozens of independent parameters.
|
|
|
|
### Config Loading (`loader.py`)
|
|
|
|
```python
|
|
def load_config(preset_name: str) -> ExperimentConfig:
|
|
base = load_toml("default.toml")
|
|
overlay = load_toml(f"{preset_name}.toml")
|
|
merged = deep_merge(base, overlay)
|
|
return ExperimentConfig(**merged)
|
|
```
|
|
|
|
**Design choice**: Config inheritance via deep merge. Presets only specify what differs from default, keeping preset files concise and maintainable.
|
|
|
|
## Distribution Specifications (`sampling.py`)
|
|
|
|
### The DistSpec System
|
|
|
|
Many world generation parameters use statistical distributions rather than fixed values:
|
|
|
|
```python
|
|
class DistSpec(BaseModel):
|
|
"""Discriminated union of distribution types."""
|
|
type: Literal["triangular", "beta", "normal", "uniform", "constant"]
|
|
# Parameters vary by type
|
|
```
|
|
|
|
**Supported distributions:**
|
|
|
|
| Type | Parameters | Use Case |
|
|
|------|-----------|----------|
|
|
| `triangular` | min, mode, max | Task rewards, skill rates (natural asymmetric bell curve) |
|
|
| `beta` | alpha, beta, scale | Prestige requirements (skewed toward low values) |
|
|
| `normal` | mean, std | Symmetric variation around a target |
|
|
| `uniform` | low, high | Equal probability across range |
|
|
| `constant` | value | Fixed value (no randomness) |
|
|
|
|
**Why discriminated unions?** Pydantic validates the correct parameters for each distribution type at load time. Invalid combinations (e.g., triangular with alpha parameter) are caught before the simulation runs.
|
|
|
|
### Usage Example
|
|
|
|
```toml
|
|
[world.tasks.reward_funds]
|
|
type = "triangular"
|
|
min = 5000_00
|
|
mode = 15000_00
|
|
max = 50000_00
|
|
|
|
[world.employees.junior_rate]
|
|
type = "beta"
|
|
alpha = 2.0
|
|
beta = 5.0
|
|
scale = 3.0
|
|
```
|
|
|
|
## World Generation
|
|
|
|
### Seeding (`services/seed_world.py`)
|
|
|
|
```python
|
|
def seed_world_transactional(session, cfg, seed):
|
|
rng = create_rng(seed)
|
|
company = create_company(session, cfg.world.company)
|
|
employees = generate_employees(session, company, cfg.world.employees, rng)
|
|
tasks = generate_tasks(session, cfg.world.tasks, rng)
|
|
sim_state = create_sim_state(session, company, cfg.sim, seed)
|
|
```
|
|
|
|
**Design choice**: Single-transaction world seeding ensures atomic creation. Either the entire world is created or nothing is -- no partial states.
|
|
|
|
### Employee Generation (`services/generate_employees.py`)
|
|
|
|
1. Generate N employees (default 10)
|
|
2. Assign tiers from configured distribution (e.g., 30/40/30 junior/mid/senior)
|
|
3. For each employee, sample 4 skill rates from per-tier distributions
|
|
4. Set salary based on tier range
|
|
|
|
### Task Generation (`services/generate_tasks.py`)
|
|
|
|
1. Generate M tasks (default 200+)
|
|
2. First 10 tasks are always prestige-1 (guaranteed accessible)
|
|
3. Remaining tasks have stratified prestige requirements
|
|
4. Each task gets 2-4 domain requirements sampled from distributions
|
|
5. Rewards scale with prestige and task size
|
|
|
|
**Design choice**: Stratified generation ensures:
|
|
- The agent always has starting tasks (prestige-1 guaranteed)
|
|
- Tasks span the full prestige range (progression is possible)
|
|
- No prestige "dead zones" where no tasks exist
|
|
|
|
### RNG Management (`services/rng.py`)
|
|
|
|
```python
|
|
def create_rng(seed: int) -> numpy.random.Generator:
|
|
return numpy.random.default_rng(seed)
|
|
```
|
|
|
|
**Design choice**: Centralized RNG with explicit seed ensures full reproducibility. Same seed → same world → same event sequence (given same agent actions).
|
|
|
|
## Key Configuration Parameters
|
|
|
|
### Financial Tuning
|
|
|
|
| Parameter | Default | Effect |
|
|
|-----------|---------|--------|
|
|
| `initial_funds_cents` | 500,000 | Starting capital |
|
|
| `reward_prestige_scale` | 0.15 | How much prestige amplifies rewards |
|
|
| `salary_bump_pct` | 1.0 | Per-completion salary increase |
|
|
|
|
### Prestige Tuning
|
|
|
|
| Parameter | Default | Effect |
|
|
|-----------|---------|--------|
|
|
| `prestige_decay_per_day` | 0.005 | Daily prestige loss |
|
|
| `penalty_fail_multiplier` | 0.8 | Prestige cost of late completion |
|
|
| `penalty_cancel_multiplier` | 1.0 | Prestige cost of cancellation |
|
|
| `prestige_min` | 1.0 | Floor value |
|
|
| `prestige_max` | 10.0 | Ceiling value |
|
|
|
|
### Task Tuning
|
|
|
|
| Parameter | Default | Effect |
|
|
|-----------|---------|--------|
|
|
| `deadline_qty_per_day` | 11.0 | Deadline generosity |
|
|
| `num_domains_per_task` | 2-4 | Multi-domain complexity |
|
|
| `progress_milestone_pct` | 50 | When to fire halfway event |
|
|
|
|
### Agent Tuning
|
|
|
|
| Parameter | Default | Effect |
|
|
|-----------|---------|--------|
|
|
| `max_turns` | 500 | Hard turn limit |
|
|
| `max_turns_without_resume` | 5 | Auto-resume threshold |
|
|
| `history_truncation` | 50 | Turns kept in context |
|