diff --git a/system_design/02_data_models.md b/system_design/02_data_models.md index e099739..37b77e6 100644 --- a/system_design/02_data_models.md +++ b/system_design/02_data_models.md @@ -146,11 +146,12 @@ The benchmark uses SQLAlchemy's declarative ORM over SQLite for several reasons: |--------|------|-------| | `id` | UUID (PK) | Auto-generated | | `name` | String(255) | Client company name (e.g. "Nexus AI") | -| `reward_multiplier` | Float | Hidden per-client bonus [0.7, 2.5], not shown to agent | +| `reward_multiplier` | Float | Per-client reward factor [0.7, 2.5] (currently unused in reward calculation) | | `tier` | String(32) | Agent-visible label: Standard / Premium / Enterprise | | `specialty_domains` | JSON | List of 1-2 domain strings (e.g. ["research", "training"]) | +| `loyalty` | Float | Hidden loyalty score [-1.0, 1.0]. RAT clients (< -0.3) cause scope creep | -**Design choice**: The `reward_multiplier` is hidden from the agent; only `tier` is visible. This prevents trivially optimal strategy (always pick highest multiplier) and requires the agent to experiment and observe payouts. +**Design choice**: `loyalty` is hidden from the agent. RAT clients secretly inflate task work after acceptance (scope creep), causing deadline failures. The agent must detect RATs by observing per-client failure patterns via `client history`. ### ClientTrust (`models/client.py`) diff --git a/system_design/05_financial_model.md b/system_design/05_financial_model.md index 9bba2b1..40cf536 100644 --- a/system_design/05_financial_model.md +++ b/system_design/05_financial_model.md @@ -69,14 +69,16 @@ total_payroll = sum(employee.salary_cents for all employees) ### Salary Bumps -Each completed task increases salaries: +Each completed task increases salaries by a fixed per-tier amount (linear, not compounding): ``` +tier_midpoints = {junior: (min+max)/2, mid: (min+max)/2, senior: (min+max)/2} for each assigned employee: - salary_cents *= 1.01 # 1% increase per completion + bump = tier_midpoints[employee.tier] * salary_bump_pct + salary_cents += bump # fixed amount per task ``` -**Design choice**: Compounding salary increases mean success has a hidden cost. Long-running simulations see payroll grow substantially, creating late-game financial pressure even as task rewards scale with prestige. +**Design choice**: Linear salary bumps create steady payroll growth without exponential compounding. A junior gets ~$30/task bump, mid ~$70, senior ~$125 (at 1% of tier midpoint). This avoids runaway payroll in the late game while still creating pressure. ### Failure Penalties diff --git a/system_design/06_employee_model.md b/system_design/06_employee_model.md index 9c97da8..9f0595f 100644 --- a/system_design/06_employee_model.md +++ b/system_design/06_employee_model.md @@ -44,9 +44,19 @@ class EmployeeSkillRate: rate_domain_per_hour: float # work units produced per business hour ``` -Rates are generated from configurable distributions (triangular, beta, etc.) during world seeding. Some employees are specialists (high in one domain, low in others); some are generalists. +Rates are generated by uniform sampling within tier-specific bounds: -**Design choice**: The 4-rate vector per employee creates a rich assignment optimization space. Optimal assignment requires matching employee strengths to task domain requirements. +```python +# Each domain rate is independently sampled: uniform(rate_min, rate_max) +# Junior: [1.0, 4.0], Mid: [4.0, 7.0], Senior: [7.0, 10.0] +# An employee can have near-zero in some domains and high in others. +``` + +The team composition follows a fixed ratio: 5 junior, 3 mid, 2 senior (for a 10-person team), shuffled randomly. Employees and clients use a fixed world seed (seed=1) so the same team appears across all run seeds — only task generation varies. + +Skill rates are capped at `skill_rate_max` (default 10.0) even after task-completion boosts. + +**Design choice**: Uniform per-domain sampling creates natural specialization without complex distribution mechanics. The fixed world seed ensures consistent employee/client composition for fair cross-seed comparison. ## Throughput Splitting diff --git a/system_design/08_cli_interface.md b/system_design/08_cli_interface.md index 81e7949..ecb8722 100644 --- a/system_design/08_cli_interface.md +++ b/system_design/08_cli_interface.md @@ -68,10 +68,14 @@ Returns all employees with tier, salary, and current active task count. ### Market Commands -#### `market browse [--domain X] [--min-prestige N] [--max-prestige N] [--offset O] [--limit L]` -Browse available market tasks with optional filters. +#### `market browse [--domain X] [--reward-min-cents N] [--offset O] [--limit L]` +Browse available market tasks with optional filters. Results are capped at `market_browse_default_limit` (default 50) per page. -**Design choice**: Filtering and pagination prevent information overload. The agent can focus on tasks matching its current prestige level and strategic goals. +The browse **auto-filters** by prestige and trust: only tasks the company can actually accept are shown. This means: +- Per-domain prestige check: all required domains must meet the task's `required_prestige` +- Trust check: company must have sufficient trust with the task's client + +**Design choice**: Auto-filtering prevents the agent from wasting turns trying to accept inaccessible tasks. Pagination (`--offset`) allows browsing beyond the first page. ### Task Commands diff --git a/system_design/10_runner_orchestration.md b/system_design/10_runner_orchestration.md index 619c7c9..c1a1fce 100644 --- a/system_design/10_runner_orchestration.md +++ b/system_design/10_runner_orchestration.md @@ -18,7 +18,8 @@ def run_benchmark(args): # 2. Initialize database engine, factory = init_db(db_path) - # 3. Seed world + # 3. Seed world (employees + clients use fixed seed=1 for consistency; + # only task generation uses the run seed) with session_scope(factory) as session: seed_world_transactional(session, cfg, args.seed) diff --git a/system_design/11_client_trust.md b/system_design/11_client_trust.md index 56b5f83..17584a9 100644 --- a/system_design/11_client_trust.md +++ b/system_design/11_client_trust.md @@ -2,103 +2,74 @@ ## The Big Idea -Every client has a **hidden loyalty score** the agent can't see. Some clients are loyal (investing in them pays off), some are adversarial "RATs" (investing in them backfires). The agent has to figure out which is which from observed behavior — delayed consequences, not explicit labels. +Every client has a **hidden loyalty score** the agent can't see. Some clients are loyal (investing in them pays off), some are adversarial "RATs" (investing in them backfires). The agent has to figure out which is which from observed behavior — deadline failures from scope creep, not explicit labels. -This tests three things: +This tests: -1. **Can the agent invest under uncertainty?** You don't know if a client is worth it until you've sunk 10+ tasks into them. -2. **Can the agent spot patterns?** RATs look normal at first. The only signal is that tasks from them fail deadlines more often and money sometimes disappears after completion. -3. **Can the agent cut losses?** Dropping a RAT costs the trust you built. Keeping one costs real money. +1. **Can the agent spot patterns?** RATs look normal. The only signal is that tasks from them fail deadlines disproportionately. +2. **Can the agent cut losses?** Dropping a RAT means lost trust investment. Keeping one means repeated deadline failures and prestige loss. ## How Trust Works Every client starts at trust 0. Completing tasks builds trust (0-5 scale). Trust gives two benefits: -- **Work reduction**: Up to 40% less work per task at max trust (loyal clients give clearer specs) -- **Gated tasks**: ~20% of high-reward tasks require minimum trust to accept +- **Work reduction**: Up to 35-40% less work per task at max trust (trusted clients give clearer specs) +- **Gated tasks**: ~20-30% of high-reward tasks require minimum trust to accept Trust decays daily and drops on failure/cancellation. Working for Client A erodes trust with all other clients (cross-client decay), so you can't maintain trust with everyone — you have to pick 2-3 clients to focus on. +Note: Trust does NOT affect task reward amounts. Reward multiplier was removed — only work reduction remains. The revenue benefit of trust is indirect: faster task completion → more tasks per month → more revenue. + ## How Loyalty Works -At world generation, each client gets a hidden loyalty score from `triangular(-1, 1, mode≈0.6)`: +At world generation, a fixed number of RATs are guaranteed: `round(num_clients × loyalty_rat_fraction)`, minimum 1. RATs get loyalty in [-1.0, -0.3], non-RATs get loyalty in [-0.3, 1.0]. -- **Loyal** (> 0.3): ~50% of clients. Trust investment pays off via work reduction. -- **Neutral** (-0.3 to 0.3): ~35%. No special effects. -- **RAT** (< -0.3): ~15%. Adversarial. Looks normal, exploits you at higher trust. +- **Loyal** (> 0.3): Trust investment pays off via work reduction. +- **Neutral** (-0.3 to 0.3): No special effects. +- **RAT** (< -0.3): Adversarial. Looks normal, causes scope creep on accepted tasks. + +Employees and clients use a **fixed world seed** (seed=1) so the same clients (including the same RATs) appear across all run seeds. Only task generation varies by seed. The agent never sees loyalty scores. It only sees: client name, tier, specialties, trust level. -## What RATs Do +## What RATs Do: Scope Creep -RAT effects activate once trust exceeds `loyalty_reveal_trust` (default 0.5 for medium). The effects scale with `|loyalty| × sqrt(trust_fraction)` — sqrt scaling means they bite early and plateau, rather than being negligible until max trust. - -### 1. Scope Creep (Bait-and-Switch) - -When you accept a task from a RAT at sufficient trust, the **actual work required is secretly inflated** — but the deadline is calculated from the original (smaller) amount. The task looks completable but isn't. - -- **Max inflation**: `severity × 0.70` (medium: 56%) -- **Effect**: Tasks from RATs miss deadlines more often. The agent notices when progress milestones arrive later than expected. - -### 2. Payment Disputes (Delayed Clawback) - -After completing a RAT's task, there's a random chance a `PAYMENT_DISPUTE` event fires 2-7 days later, clawing back a chunk of the reward. - -- **Max clawback**: `severity × 0.80` of the reward (medium: 64%) -- **Max probability**: `severity × 0.50` per task (medium: 40%) -- **Effect**: The agent gets paid, then days later money disappears. The only way to notice is checking `client history` and seeing listed rewards don't match received amounts. - -### 3. Work Reduction for Loyal Clients - -Loyal clients reduce required work by `trust_work_reduction_max × trust / trust_max`. This is the payoff for choosing well — loyal clients make tasks faster, meaning more tasks, more revenue. - -## Intensity Scaling - -All RAT effects use the same intensity formula: +When the agent accepts a task from a RAT client, the **actual work required is secretly inflated** after acceptance — but the deadline is calculated from the original (smaller) amount. The task looks completable when browsing but isn't. ``` -trust_fraction = (trust - threshold) / (max_trust - threshold) -intensity = |loyalty| × sqrt(trust_fraction) +inflation = scope_creep_max × |loyalty| +inflation = max(1.3, inflation) # minimum 130% inflation ensures deadline failure +for each requirement: + required_qty *= (1 + inflation) ``` -The sqrt makes effects noticeable early (trust barely above threshold) rather than negligible until max trust. At medium difficulty with a RAT (loyalty -0.57) at trust 2.0: +- **Scope creep formula**: `scope_creep_max = loyalty_severity × 1.0` +- **At medium (severity=1.0)**: A RAT with loyalty=-0.7 inflates work by 130% (minimum floor) +- **Effect**: RAT tasks always miss deadlines → zero reward + prestige penalty +Scope creep activates from the first task (no trust threshold needed). The agent can detect it by noticing that tasks from certain clients consistently fail despite looking feasible in the market. -| Effect | Value | -| ------------------- | ---------------------- | -| Scope creep | +18% work inflation | -| Dispute probability | 13% per completed task | -| Clawback amount | up to 12% of reward | - +Payment disputes were removed — scope creep alone provides sufficient RAT damage. ## How the Agent Can Detect RATs The agent has one tool: `yc-bench client history`. This shows per-client: -- Tasks completed (success/fail count) -- Listed reward total vs net received (after disputes) -- Dispute count +- Tasks succeeded and failed count +- `failure_rate_pct` per client -An agent that periodically checks history will notice: +An agent that periodically checks history will notice a client whose tasks fail deadlines more than others (scope creep signal). An agent that never checks will keep getting exploited. -- A client whose tasks fail deadlines more than others (scope creep) -- A client where net received < listed rewards (disputes) - -An agent that never checks will keep getting exploited. +Additionally, the agent can observe via `task inspect` that the `required_qty` is larger than what was listed in `market browse` — a direct scope creep signal if the agent compares pre-accept and post-accept values. ## Config Knobs - | Knob | Medium | Hard | Nightmare | | ---------------------- | ------ | ---- | --------- | -| `loyalty_rat_fraction` | 0.15 | 0.20 | 0.25 | -| `loyalty_severity` | 0.8 | 0.7 | 0.9 | -| `loyalty_reveal_trust` | 0.5 | 1.5 | 1.0 | - +| `loyalty_rat_fraction` | 0.20 | 0.20 | 0.25 | +| `loyalty_severity` | 1.0 | 0.7 | 0.9 | +| `loyalty_reveal_trust` | 0.0 | 1.5 | 1.0 | Derived from severity: -- `scope_creep_max = severity × 0.70` -- `dispute_clawback_max = severity × 0.80` -- `dispute_prob_max = severity × 0.50` - +- `scope_creep_max = severity × 1.0`