mirror of https://github.com/collinear-ai/yc-bench.git synced 2026-04-19 12:58:03 +00:00

AnandK27 ecd3d9e415 Add system design documentation for yc-bench

Comprehensive documentation covering all major subsystems:
simulation engine, data models, task system, prestige, finances,
employees, agent layer, CLI interface, configuration, and runner.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-08 13:42:41 -07:00

6.9 KiB

Raw Permalink Blame History

Configuration System

Location: src/yc_bench/config/

Overview

The configuration system uses Pydantic models validated from TOML preset files. It controls every aspect of the simulation: world generation parameters, difficulty tuning, agent behavior, and distribution specifications.

Design Choices

Pydantic Schema (`schema.py`)

The configuration hierarchy:

ExperimentConfig
├── AgentConfig          # LLM model, tools, retry settings
├── LoopConfig           # Turn budget, auto-resume threshold
├── SimConfig            # Simulation parameters
└── WorldConfig          # World generation parameters
    ├── CompanyConfig     # Initial funds, starting prestige
    ├── EmployeeConfig    # Team size, tier distribution, salary ranges
    ├── TaskConfig        # Task count, domain requirements, deadlines
    └── PrestigeConfig    # Decay rate, penalty multipliers, scaling

Why Pydantic?

Type validation at load time (catch config errors early)
Default values with optional overrides
Discriminated unions for distribution specs
Clear documentation through type annotations
Serialization to/from TOML/JSON

TOML Preset Files (`presets/`)

# medium.toml
[world]
initial_funds_cents = 500_000_00

[world.prestige]
decay_per_day = 0.005
penalty_fail_multiplier = 0.8
penalty_cancel_multiplier = 1.0

[world.tasks]
count = 200
deadline_qty_per_day = 11.0

[world.tasks.reward_funds]
type = "triangular"
min = 5000_00
mode = 15000_00
max = 50000_00

Why TOML? Human-readable, supports comments, natural hierarchy via sections, widely supported in Python. Better than JSON for config files (comments), simpler than YAML (fewer gotchas).

Preset Hierarchy

Preset	Focus	Key Characteristics
`default.toml`	Base	All defaults; other presets override selectively
`tutorial.toml`	Learning	Relaxed deadlines, prestige-1 tasks only, high funds
`easy.toml`	Casual	Relaxed deadlines, flat prestige requirements
`medium.toml`	Standard	Prestige climbing, 2-domain tasks, 9-day deadlines
`hard.toml`	Challenge	Prestige gating active, 7-day deadlines, 1.5x cancel penalty
`nightmare.toml`	Extreme	Razor-thin margins, 6-day deadlines, 2x penalties

Design choice: Preset-based difficulty rather than a single "difficulty slider" allows fine-grained control. Each preset can tune dozens of independent parameters.

Config Loading (`loader.py`)

def load_config(preset_name: str) -> ExperimentConfig:
    base = load_toml("default.toml")
    overlay = load_toml(f"{preset_name}.toml")
    merged = deep_merge(base, overlay)
    return ExperimentConfig(**merged)

Design choice: Config inheritance via deep merge. Presets only specify what differs from default, keeping preset files concise and maintainable.

Distribution Specifications (`sampling.py`)

The DistSpec System

Many world generation parameters use statistical distributions rather than fixed values:

class DistSpec(BaseModel):
    """Discriminated union of distribution types."""
    type: Literal["triangular", "beta", "normal", "uniform", "constant"]
    # Parameters vary by type

Supported distributions:

Type	Parameters	Use Case
`triangular`	min, mode, max	Task rewards, skill rates (natural asymmetric bell curve)
`beta`	alpha, beta, scale	Prestige requirements (skewed toward low values)
`normal`	mean, std	Symmetric variation around a target
`uniform`	low, high	Equal probability across range
`constant`	value	Fixed value (no randomness)

Why discriminated unions? Pydantic validates the correct parameters for each distribution type at load time. Invalid combinations (e.g., triangular with alpha parameter) are caught before the simulation runs.

Usage Example

[world.tasks.reward_funds]
type = "triangular"
min = 5000_00
mode = 15000_00
max = 50000_00

[world.employees.junior_rate]
type = "beta"
alpha = 2.0
beta = 5.0
scale = 3.0

World Generation

Seeding (`services/seed_world.py`)

def seed_world_transactional(session, cfg, seed):
    rng = create_rng(seed)
    company = create_company(session, cfg.world.company)
    employees = generate_employees(session, company, cfg.world.employees, rng)
    tasks = generate_tasks(session, cfg.world.tasks, rng)
    sim_state = create_sim_state(session, company, cfg.sim, seed)

Design choice: Single-transaction world seeding ensures atomic creation. Either the entire world is created or nothing is -- no partial states.

Employee Generation (`services/generate_employees.py`)

Generate N employees (default 10)
Assign tiers from configured distribution (e.g., 30/40/30 junior/mid/senior)
For each employee, sample 4 skill rates from per-tier distributions
Set salary based on tier range

Task Generation (`services/generate_tasks.py`)

Generate M tasks (default 200+)
First 10 tasks are always prestige-1 (guaranteed accessible)
Remaining tasks have stratified prestige requirements
Each task gets 2-4 domain requirements sampled from distributions
Rewards scale with prestige and task size

Design choice: Stratified generation ensures:

The agent always has starting tasks (prestige-1 guaranteed)
Tasks span the full prestige range (progression is possible)
No prestige "dead zones" where no tasks exist

RNG Management (`services/rng.py`)

def create_rng(seed: int) -> numpy.random.Generator:
    return numpy.random.default_rng(seed)

Design choice: Centralized RNG with explicit seed ensures full reproducibility. Same seed → same world → same event sequence (given same agent actions).

Key Configuration Parameters

Financial Tuning

Parameter	Default	Effect
`initial_funds_cents`	500,000	Starting capital
`reward_prestige_scale`	0.15	How much prestige amplifies rewards
`salary_bump_pct`	1.0	Per-completion salary increase

Prestige Tuning

Parameter	Default	Effect
`prestige_decay_per_day`	0.005	Daily prestige loss
`penalty_fail_multiplier`	0.8	Prestige cost of late completion
`penalty_cancel_multiplier`	1.0	Prestige cost of cancellation
`prestige_min`	1.0	Floor value
`prestige_max`	10.0	Ceiling value

Task Tuning

Parameter	Default	Effect
`deadline_qty_per_day`	11.0	Deadline generosity
`num_domains_per_task`	2-4	Multi-domain complexity
`progress_milestone_pct`	50	When to fire halfway event

Agent Tuning

Parameter	Default	Effect
`max_turns`	500	Hard turn limit
`max_turns_without_resume`	5	Auto-resume threshold
`history_truncation`	50	Turns kept in context

6.9 KiB Raw Permalink Blame History