Merge pull request #7 from collinear-ai/feat/employee_tiers

Feat/employee tiers
This commit is contained in:
Adit Jain 2026-03-07 22:04:45 -08:00 committed by GitHub
commit b1cd7ebfb2
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
19 changed files with 7244 additions and 5336 deletions

BIN
.DS_Store vendored

Binary file not shown.

View file

@ -56,8 +56,8 @@ bash scripts/run_benchmark.sh --seed 1 --config hard
### Core loop ### Core loop
1. Agent calls `yc-bench sim resume` to advance time to the next event. 1. Agent calls `yc-bench sim resume` to advance time to the next event or monthly payroll.
2. The engine flushes task progress, fires due events, applies payroll. 2. The engine flushes task progress, applies prestige decay, fires due events, applies payroll.
3. Agent reads wake events and decides: accept tasks, assign employees, dispatch, cancel. 3. Agent reads wake events and decides: accept tasks, assign employees, dispatch, cancel.
4. Repeat until bankruptcy or horizon end. 4. Repeat until bankruptcy or horizon end.
@ -65,12 +65,14 @@ The simulation ends on **bankruptcy** (funds < 0 after payroll), **horizon end**
### Key mechanics ### Key mechanics
- **Funds**: start at $250K. Monthly payroll is deducted automatically. Task rewards scale with prestige (`base × (1 + 0.55 × (prestige 1))`). - **Funds**: starting capital varies by preset ($80K$250K). Monthly payroll is deducted automatically. Task rewards scale with prestige (`base × (1 + scale × (prestige 1))`).
- **4 domains**: `research · inference · data/environment · training`. Each domain tracks prestige independently in [1.0, 10.0]. - **4 domains**: `research · inference · data/environment · training`. Each domain tracks prestige independently in [1.0, 10.0].
- **Prestige gating**: tasks require a minimum prestige level. Most tasks need prestige 35, so the agent must climb from 1.0 by completing easier tasks first. First 10 market tasks are stratified `[1,1,1,1,2,2,2,3,3,4]` to bootstrap progression. - **Per-domain prestige gating**: a task's required prestige is checked against **each** of its required domains. The agent must climb prestige broadly, not just in one domain.
- **Prestige decay**: every domain loses prestige daily. Neglected domains decay back toward 1.0. The agent must stay active across domains to maintain market access.
- **Prestige-scaled work volume**: higher-prestige tasks require proportionally more work. Higher prestige pays more but demands more capacity.
- **Employees**: 10 employees across 3 tiers (junior/mid/senior). The agent sees only each employee's tier and salary — not their per-domain skill rates. A junior can secretly be a superstar in one domain, so the agent must infer productivity from task progress observations. - **Employees**: 10 employees across 3 tiers (junior/mid/senior). The agent sees only each employee's tier and salary — not their per-domain skill rates. A junior can secretly be a superstar in one domain, so the agent must infer productivity from task progress observations.
- **Throughput splitting**: an employee assigned to N active tasks has `effective_rate = base_rate / N`. Focus beats breadth. - **Throughput splitting**: an employee assigned to N active tasks has `effective_rate = base_rate / N`. Focus beats breadth.
- **Task success**: on-time completion awards funds + prestige + skill boosts + 1% salary bump (compounding payroll pressure). Late completion penalises prestige (1.4×). Cancellation penalises harder (2.0×). - **Task success**: on-time completion awards funds + prestige + skill boosts + 1% salary bump (compounding payroll pressure). Late completion penalises prestige. Cancellation penalises harder.
- **Progress checkpoints**: the agent is woken at 25%, 50%, 75%, and 100% completion — providing data points to estimate employee productivity. - **Progress checkpoints**: the agent is woken at 25%, 50%, 75%, and 100% completion — providing data points to estimate employee productivity.
- **Scratchpad**: persistent notes in the DB that survive context truncation (only last 20 conversation rounds are kept). - **Scratchpad**: persistent notes in the DB that survive context truncation (only last 20 conversation rounds are kept).
@ -92,7 +94,7 @@ yc-bench report monthly # P&L per month
yc-bench task accept --task-id UUID # pull from market yc-bench task accept --task-id UUID # pull from market
yc-bench task assign --task-id UUID --employee-id UUID yc-bench task assign --task-id UUID --employee-id UUID
yc-bench task dispatch --task-id UUID # start work yc-bench task dispatch --task-id UUID # start work
yc-bench task cancel --task-id UUID --reason "" # cancel (2× prestige penalty) yc-bench task cancel --task-id UUID --reason "" # cancel (prestige penalty)
yc-bench sim resume # advance time yc-bench sim resume # advance time
yc-bench scratchpad write/append/clear # persistent memory yc-bench scratchpad write/append/clear # persistent memory
``` ```
@ -103,13 +105,15 @@ yc-bench scratchpad write/append/clear # persistent memory
Experiment presets live in `src/yc_bench/config/presets/` as TOML files. Pass the preset name via `--config`. Experiment presets live in `src/yc_bench/config/presets/` as TOML files. Pass the preset name via `--config`.
| Config | Employees | Tasks | Tests | All presets use 10 employees and 200 market tasks. Difficulty comes from deadline pressure, penalty severity, prestige distribution, and task size.
|--------|-----------|-------|-------|
| **tutorial** | 3 | 50 | Basic accept→assign→dispatch loop | | Config | Deadline pressure | Prestige mode | What it tests |
| **easy** | 5 | 100 | Throughput awareness | |--------|------------------|---------------|---------------|
| **medium** | 5 | 150 | Prestige climbing + domain specialization | | **tutorial** | Very relaxed | 1 | Basic accept→assign→dispatch loop |
| **hard** | 7 | 200 | Precise ETA reasoning | | **easy** | Relaxed | 1 | Throughput awareness |
| **nightmare** | 8 | 300 | Sustained perfection under compounding payroll | | **medium** | Moderate | 3 | Prestige climbing + domain specialization |
| **hard** | Tight | 4 | Precise ETA reasoning + capacity planning |
| **nightmare** | Razor-thin | 5 | Sustained perfection under compounding payroll |
See `default.toml` for the full list of tunable parameters. See `default.toml` for the full list of tunable parameters.
@ -117,44 +121,7 @@ See `default.toml` for the full list of tunable parameters.
## Benchmark results ## Benchmark results
### Sonnet 4.6 vs Gemini 3 Flash vs GPT-5.2 — 1-year horizon, 3 seeds per config *Results pending — re-running benchmarks with updated economics.*
![3-model comparison](plots/sonnet_vs_gemini.png)
#### Survival rates
| Config | Sonnet 4.6 | Gemini 3 Flash | GPT-5.2 |
|--------|-----------|----------------|---------|
| **medium** | 3/3 | 3/3 | 3/3 |
| **hard** | 1/3 | 2/3 | 2/3 |
| **nightmare** | 1/3 | 3/3 | 2/3 |
#### Final funds (bankrupt = funds < 0)
| Config | Seed | Sonnet 4.6 | Gemini 3 Flash | GPT-5.2 |
|--------|------|-----------|----------------|---------|
| medium | 1 | **$9.1M** | **$9.5M** | **$1.8M** |
| medium | 2 | **$6.1M** | **$11.0M** | **$321K** |
| medium | 3 | **$107K** | **$15.8M** | **$28K** |
| hard | 1 | bankrupt | bankrupt | bankrupt |
| hard | 2 | **$63K** | **$412K** | **$15.7M** |
| hard | 3 | bankrupt | **$21.9M** | **$43.5M** |
| nightmare | 1 | bankrupt | **$2.1M** | bankrupt |
| nightmare | 2 | **$10.1M** | **$214K** | **$2.2M** |
| nightmare | 3 | bankrupt | **$805K** | **$23.6M** |
**Overall: Gemini 8/9 · GPT-5.2 7/9 · Sonnet 5/9**
#### Key findings
- **Gemini leads on consistency** (8/9 survival). The only model to sweep all 3 nightmare seeds.
- **GPT-5.2 has the highest ceiling.** Hard seed 3: $43.5M vs Gemini's $21.9M. When it survives, it tends to outperform by a wide margin.
- **Sonnet is high-variance.** Nightmare seed 2: $10.1M (best nightmare result), but 4/9 bankruptcies overall.
- **Win rate predicts survival.** Every run with >58% task win rate survived. Every run below 40% went bankrupt.
#### Prestige specialization
![Prestige radar](plots/prestige_radar.png)
--- ---

Binary file not shown.

After

Width:  |  Height:  |  Size: 127 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 191 KiB

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -346,6 +346,7 @@ def run_bot(config_name: str, seed: int, bot_slug: str, strategy_fn: StrategyFn)
replacement = generate_replacement_task( replacement = generate_replacement_task(
run_seed=sim_state.run_seed, run_seed=sim_state.run_seed,
replenish_counter=counter, replenish_counter=counter,
replaced_prestige=best_task.required_prestige,
cfg=world_cfg, cfg=world_cfg,
) )
replacement_row = Task( replacement_row = Task(

View file

@ -26,7 +26,7 @@ DEFAULT_RUNS = [
{"label": "kimi-k2.5", "model_slug": "openrouter_moonshotai_kimi-k2.5", "color": "#2ecc71"}, {"label": "kimi-k2.5", "model_slug": "openrouter_moonshotai_kimi-k2.5", "color": "#2ecc71"},
] ]
INITIAL_FUNDS_CENTS = 25_000_000 # $250K INITIAL_FUNDS_CENTS = 15_000_000 # $150K (default; presets may override)
def parse_args(): def parse_args():
@ -129,7 +129,7 @@ def make_plot(run_data, seed, config_name, budget_usd, out_path: Path):
# ── Funds curves ───────────────────────────────────────────────────────── # ── Funds curves ─────────────────────────────────────────────────────────
ax_funds.axhline(0, color="#e74c3c", linewidth=0.9, linestyle="--", alpha=0.4, zorder=1) ax_funds.axhline(0, color="#e74c3c", linewidth=0.9, linestyle="--", alpha=0.4, zorder=1)
ax_funds.axhline(250_000, color="#555577", linewidth=0.7, linestyle=":", alpha=0.6, zorder=1) ax_funds.axhline(INITIAL_FUNDS_CENTS / 100, color="#555577", linewidth=0.7, linestyle=":", alpha=0.6, zorder=1)
for r in run_data: for r in run_data:
if not r["times"]: if not r["times"]:

View file

@ -88,6 +88,7 @@ def task_accept(
replacement = generate_replacement_task( replacement = generate_replacement_task(
run_seed=sim_state.run_seed, run_seed=sim_state.run_seed,
replenish_counter=counter, replenish_counter=counter,
replaced_prestige=task.required_prestige,
cfg=_get_world_cfg(), cfg=_get_world_cfg(),
) )

View file

@ -53,7 +53,7 @@ company_name = "BenchCo"
[world] [world]
num_employees = 10 num_employees = 10
initial_funds_cents = 25_000_000 # $250,000 initial_funds_cents = 15_000_000 # $150,000
initial_prestige_level = 1.0 initial_prestige_level = 1.0
work_hours_per_day = 9.0 work_hours_per_day = 9.0
@ -78,9 +78,9 @@ penalty_cancel_multiplier = 2.0 # hardened: was 1.2
reward_prestige_scale = 0.55 # hardened: was 0.3 reward_prestige_scale = 0.55 # hardened: was 0.3
# Daily prestige decay per domain. Domains not exercised lose prestige # Daily prestige decay per domain. Domains not exercised lose prestige
# over time: -0.01/day → -0.3/month. Untouched domain drops ~1 level # over time: -0.005/day → -0.15/month. Untouched domain drops ~1 level
# every ~3 months. Prevents single-domain hyper-specialization. # every ~6 months. Prevents single-domain hyper-specialization.
prestige_decay_per_day = 0.01 prestige_decay_per_day = 0.005
# Required qty scaling by prestige: qty *= 1 + scale * (prestige - 1). # Required qty scaling by prestige: qty *= 1 + scale * (prestige - 1).
# At 0.3: prestige-5 tasks need 2.2x the work of prestige-1 tasks. # At 0.3: prestige-5 tasks need 2.2x the work of prestige-1 tasks.
@ -90,7 +90,7 @@ prestige_qty_scale = 0.3
# --- Deadline --- # --- Deadline ---
# Deadline = max(deadline_min_biz_days, max_domain_qty / deadline_qty_per_day). # Deadline = max(deadline_min_biz_days, max_domain_qty / deadline_qty_per_day).
# Domains are worked in parallel, so deadline scales with heaviest domain, not sum. # Domains are worked in parallel, so deadline scales with heaviest domain, not sum.
deadline_qty_per_day = 150.0 deadline_qty_per_day = 200.0
deadline_min_biz_days = 7 deadline_min_biz_days = 7
# --- Progress milestones (checkpoint events at these completion fractions) --- # --- Progress milestones (checkpoint events at these completion fractions) ---
@ -120,12 +120,12 @@ high = 10
mode = 4 # hardened: base default is mode=1 mode = 4 # hardened: base default is mode=1
# Base reward paid on task completion, in cents (scaled further by prestige). # Base reward paid on task completion, in cents (scaled further by prestige).
# Higher-prestige tasks automatically pay more via reward_prestige_scale. # Mode $14K: prestige-1 tasks burn cash, prestige-3 breaks even, prestige-4+ profits.
[world.dist.reward_funds_cents] [world.dist.reward_funds_cents]
type = "triangular" type = "triangular"
low = 500_000 # $5,000 low = 300_000 # $3,000
high = 10_000_000 # $100,000 high = 4_000_000 # $40,000
mode = 3_000_000 # $30,000 mode = 1_400_000 # $14,000
# Number of domains each task requires work in (cast to int after sampling). # Number of domains each task requires work in (cast to int after sampling).
# mode=2: most tasks need 2 domains — single-specialist dominance gone. # mode=2: most tasks need 2 domains — single-specialist dominance gone.
@ -139,9 +139,9 @@ mode = 2 # hardened: base default is mode=1
# No trivially-small tasks: every task requires sustained employee-hours. # No trivially-small tasks: every task requires sustained employee-hours.
[world.dist.required_qty] [world.dist.required_qty]
type = "triangular" type = "triangular"
low = 500 # hardened: base default is 200 low = 800 # hardened: base default is 200
high = 3000 high = 4000
mode = 1400 # hardened: base default is 800 mode = 2000 # hardened: base default is 800
# Prestige delta awarded per domain on task success. # Prestige delta awarded per domain on task success.
# Mean ~0.1: climbing prestige 1→5 takes ~40 tasks. # Mean ~0.1: climbing prestige 1→5 takes ~40 tasks.

View file

@ -28,10 +28,11 @@ horizon_years = 1
auto_advance_after_turns = 8 auto_advance_after_turns = 8
[world] [world]
initial_funds_cents = 20_000_000 # $200,000
# Inherits num_employees=10, num_market_tasks=200 from default. # Inherits num_employees=10, num_market_tasks=200 from default.
# Moderate deadlines: 60 qty/day → ~12 day deadline. Comfortable with 34 tasks. # Moderate deadlines: 100 qty/day → 10-day deadline for mode task.
deadline_qty_per_day = 60.0 deadline_qty_per_day = 100.0
# Original (un-hardened) penalties — costly but not catastrophic. # Original (un-hardened) penalties — costly but not catastrophic.
penalty_fail_multiplier = 0.8 penalty_fail_multiplier = 0.8
@ -55,6 +56,6 @@ value = 1 # Single-domain — the test is about throughput, not assignmen
[world.dist.required_qty] [world.dist.required_qty]
type = "triangular" type = "triangular"
low = 300 low = 500
high = 1500 high = 2000
mode = 700 # Moderate size — a few days of focused work each. mode = 1000 # Larger tasks — must stay focused, no excessive parallelism.

View file

@ -40,12 +40,13 @@ horizon_years = 1
auto_advance_after_turns = 10 auto_advance_after_turns = 10
[world] [world]
initial_funds_cents = 10_000_000 # $100,000 — must reach prestige 3 by month 5
# Inherits num_employees=10, num_market_tasks=200 from default. # Inherits num_employees=10, num_market_tasks=200 from default.
# Tight deadlines: 1200/150 = 8 days. # Tight deadlines: 2000/220 = 9.1 days.
# 1 task with 5 per domain → 5.8 days. OK. # 1 task with 5 per domain → 8.7 days. Just fits.
# 2 concurrent tasks → 11.6 days. Miss. # 2 concurrent tasks → 17.4 days. Guaranteed miss.
deadline_qty_per_day = 150.0 deadline_qty_per_day = 220.0
# Stiff penalties — mistakes cost real prestige. # Stiff penalties — mistakes cost real prestige.
penalty_fail_multiplier = 1.4 penalty_fail_multiplier = 1.4
@ -71,6 +72,6 @@ mode = 2 # Most tasks need 2 domains.
[world.dist.required_qty] [world.dist.required_qty]
type = "triangular" type = "triangular"
low = 500 low = 1000
high = 2500 high = 4000
mode = 1200 # Large tasks — require sustained focus. mode = 2000 # Large tasks — each takes ~9 days with full team. No parallelism.

View file

@ -38,10 +38,10 @@ auto_advance_after_turns = 8
[world] [world]
# Inherits num_employees=10, num_market_tasks=200 from default. # Inherits num_employees=10, num_market_tasks=200 from default.
# Deadline uses max per-domain qty. 900/100 = 9 days. # Deadline uses max per-domain qty. 1500/150 = 10 days.
# 2 concurrent tasks: 5 per task → 4.3 days each. Manageable. # 1 task with 5 per domain → 6.5 days. Comfortable.
# 3 concurrent tasks: 3.3 per task → 6.6 days. Risky. # 2 concurrent tasks → 13 days. Miss.
deadline_qty_per_day = 100.0 deadline_qty_per_day = 150.0
# Real penalties — failing costs prestige, cancelling costs more. # Real penalties — failing costs prestige, cancelling costs more.
penalty_fail_multiplier = 1.0 penalty_fail_multiplier = 1.0
@ -67,6 +67,6 @@ mode = 2 # Most tasks need 2 domains.
[world.dist.required_qty] [world.dist.required_qty]
type = "triangular" type = "triangular"
low = 400 low = 700
high = 2000 high = 3000
mode = 900 # Moderate work — completable in 712 days with focus. mode = 1500 # Larger tasks — ~6.5 days with full team, no parallelism.

View file

@ -49,12 +49,13 @@ horizon_years = 1
auto_advance_after_turns = 10 auto_advance_after_turns = 10
[world] [world]
initial_funds_cents = 8_000_000 # $80,000 — razor-thin runway
# Inherits num_employees=10, num_market_tasks=200 from default. # Inherits num_employees=10, num_market_tasks=200 from default.
# Razor deadlines: 1600/200 = 8 days. # Razor deadlines: 2500/220 = 11.4 days.
# 1 task with 5 per domain → 7.7 days. Barely makes it. # 1 task with 5 per domain → 10.9 days. Barely fits.
# 2 concurrent tasks → guaranteed miss. # 2 concurrent tasks → 21.8 days. Guaranteed miss.
deadline_qty_per_day = 200.0 deadline_qty_per_day = 220.0
# Catastrophic penalties — there is no good exit from a bad accept. # Catastrophic penalties — there is no good exit from a bad accept.
penalty_fail_multiplier = 2.0 penalty_fail_multiplier = 2.0
@ -81,9 +82,9 @@ mode = 2 # Mostly 2-domain, some 3-domain.
[world.dist.required_qty] [world.dist.required_qty]
type = "triangular" type = "triangular"
low = 600 low = 1200
high = 3000 high = 5000
mode = 1600 # Large work volumes — no quick wins. mode = 2500 # Massive work volumes — each task consumes the full team.
# Slightly larger prestige gains than default (~0.13 avg) to make # Slightly larger prestige gains than default (~0.13 avg) to make
# climbing feasible despite the steep penalty. But one blown task # climbing feasible despite the steep penalty. But one blown task

View file

@ -28,10 +28,11 @@ horizon_years = 1
auto_advance_after_turns = 5 auto_advance_after_turns = 5
[world] [world]
initial_funds_cents = 25_000_000 # $250,000 — very forgiving buffer
# Inherits num_employees=10, num_market_tasks=200 from default. # Inherits num_employees=10, num_market_tasks=200 from default.
# Very generous deadlines: 30 qty/day → most tasks get 13+ day deadline. # Generous deadlines: 50 qty/day → mode task gets 12-day deadline.
deadline_qty_per_day = 30.0 deadline_qty_per_day = 50.0
# Negligible penalties — mistakes barely hurt. # Negligible penalties — mistakes barely hurt.
penalty_fail_multiplier = 0.3 penalty_fail_multiplier = 0.3
@ -53,6 +54,6 @@ value = 1 # ALL tasks single-domain — trivial assignment.
[world.dist.required_qty] [world.dist.required_qty]
type = "triangular" type = "triangular"
low = 200 low = 300
high = 800 high = 1200
mode = 400 # Small tasks, quick completions. mode = 600 # Moderate tasks, comfortable with focused execution.

View file

@ -39,7 +39,7 @@ class WorldDists(BaseModel):
) )
# Base reward paid on task completion, in cents (result cast to int). # Base reward paid on task completion, in cents (result cast to int).
reward_funds_cents: DistSpec = Field( reward_funds_cents: DistSpec = Field(
default_factory=lambda: TriangularDist(low=500_000, high=10_000_000, mode=3_000_000) default_factory=lambda: TriangularDist(low=300_000, high=4_000_000, mode=1_400_000)
) )
# Number of domains required per task (result cast to int). # Number of domains required per task (result cast to int).
domain_count: DistSpec = Field( domain_count: DistSpec = Field(
@ -105,7 +105,7 @@ class SimConfig(BaseModel):
class WorldConfig(BaseModel): class WorldConfig(BaseModel):
# --- Workforce --- # --- Workforce ---
num_employees: int = 10 num_employees: int = 10
initial_funds_cents: int = 25_000_000 # $250,000 initial_funds_cents: int = 15_000_000 # $150,000
initial_prestige_level: float = 1.0 initial_prestige_level: float = 1.0
work_hours_per_day: float = 9.0 work_hours_per_day: float = 9.0
@ -128,7 +128,7 @@ class WorldConfig(BaseModel):
# Daily prestige decay per domain. Domains not exercised lose prestige # Daily prestige decay per domain. Domains not exercised lose prestige
# over time: -0.01/day → -0.3/month → untouched domain drops ~1 level # over time: -0.01/day → -0.3/month → untouched domain drops ~1 level
# every ~3 months. Floored at prestige_min. # every ~3 months. Floored at prestige_min.
prestige_decay_per_day: float = 0.01 prestige_decay_per_day: float = 0.005
# Required qty scaling by prestige: qty *= 1 + prestige_qty_scale * (prestige - 1). # Required qty scaling by prestige: qty *= 1 + prestige_qty_scale * (prestige - 1).
# At 0.3: prestige-5 tasks need 2.2× the work of prestige-1 tasks. # At 0.3: prestige-5 tasks need 2.2× the work of prestige-1 tasks.

View file

@ -178,6 +178,13 @@ def advance_time(
result.payrolls_applied += 1 result.payrolls_applied += 1
payroll_idx += 1 payroll_idx += 1
# Report payroll as a wake event so the agent gets control back
company = db.query(Company).filter(Company.id == company_id).one()
result.wake_events.append({
"type": "monthly_payroll",
"funds_after": company.funds_cents,
})
if bankrupt: if bankrupt:
# Insert bankruptcy event at this time # Insert bankruptcy event at this time
insert_event( insert_event(
@ -188,7 +195,9 @@ def advance_time(
dedupe_key=f"bankruptcy:{current_time.isoformat()}", dedupe_key=f"bankruptcy:{current_time.isoformat()}",
) )
result.bankrupt = True result.bankrupt = True
break
# Always stop at payroll — gives the agent a chance to act
break
elif action_type == "event": elif action_type == "event":
event_result = dispatch_event(db, next_event, current_time, company_id) event_result = dispatch_event(db, next_event, current_time, company_id)

View file

@ -87,7 +87,7 @@ class Task(Base):
class TaskRequirement(Base): class TaskRequirement(Base):
__tablename__ = "task_requirements" __tablename__ = "task_requirements"
__table_args__ = ( __table_args__ = (
CheckConstraint("required_qty >= 200 AND required_qty <= 3000", name="ck_task_requirements_required_qty_range"), CheckConstraint("required_qty >= 200 AND required_qty <= 25000", name="ck_task_requirements_required_qty_range"),
CheckConstraint("completed_qty >= 0", name="ck_task_requirements_completed_qty_gte_0"), CheckConstraint("completed_qty >= 0", name="ck_task_requirements_completed_qty_gte_0"),
CheckConstraint("completed_qty <= required_qty", name="ck_task_requirements_completed_qty_lte_required_qty"), CheckConstraint("completed_qty <= required_qty", name="ck_task_requirements_completed_qty_lte_required_qty"),
) )

View file

@ -27,10 +27,9 @@ class GeneratedTask:
requirements: dict[str, int] requirements: dict[str, int]
# First 10 market tasks are given explicit prestige values to guarantee a # First 10 market tasks are forced to prestige 1 to guarantee a
# climbable ladder from the start (avoids runs where all early tasks need # bootstrapping path regardless of the prestige distribution.
# prestige 4+ before any are completable). _STRATIFIED_PRESTIGE = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
_STRATIFIED_PRESTIGE = [1, 1, 1, 1, 2, 2, 2, 3, 3, 4]
_ALL_DOMAINS = list(Domain) _ALL_DOMAINS = list(Domain)
@ -134,14 +133,14 @@ def build_task_rows(*, run_seed, count, cfg=None):
return task_rows, requirement_rows return task_rows, requirement_rows
def generate_replacement_task(*, run_seed, replenish_counter, cfg=None): def generate_replacement_task(*, run_seed, replenish_counter, replaced_prestige, cfg=None):
"""Generate a replacement task with the same prestige as the accepted task."""
if cfg is None: if cfg is None:
cfg = WorldConfig() cfg = WorldConfig()
streams = RngStreams(run_seed) streams = RngStreams(run_seed)
rng = streams.stream(f"replenish_{replenish_counter}") rng = streams.stream(f"replenish_{replenish_counter}")
prestige = _sample_required_prestige(rng, cfg) requirements = _sample_requirements(rng, cfg, prestige=replaced_prestige)
requirements = _sample_requirements(rng, cfg, prestige=prestige) return _make_task(rng, cfg, replaced_prestige, serial=replenish_counter, requirements=requirements)
return _make_task(rng, cfg, prestige, serial=replenish_counter, requirements=requirements)
__all__ = [ __all__ = [