mirror of
https://github.com/collinear-ai/yc-bench.git
synced 2026-04-19 12:58:03 +00:00
Merge pull request #7 from collinear-ai/feat/employee_tiers
Feat/employee tiers
This commit is contained in:
commit
b1cd7ebfb2
19 changed files with 7244 additions and 5336 deletions
BIN
.DS_Store
vendored
BIN
.DS_Store
vendored
Binary file not shown.
69
README.md
69
README.md
|
|
@ -56,8 +56,8 @@ bash scripts/run_benchmark.sh --seed 1 --config hard
|
|||
|
||||
### Core loop
|
||||
|
||||
1. Agent calls `yc-bench sim resume` to advance time to the next event.
|
||||
2. The engine flushes task progress, fires due events, applies payroll.
|
||||
1. Agent calls `yc-bench sim resume` to advance time to the next event or monthly payroll.
|
||||
2. The engine flushes task progress, applies prestige decay, fires due events, applies payroll.
|
||||
3. Agent reads wake events and decides: accept tasks, assign employees, dispatch, cancel.
|
||||
4. Repeat until bankruptcy or horizon end.
|
||||
|
||||
|
|
@ -65,12 +65,14 @@ The simulation ends on **bankruptcy** (funds < 0 after payroll), **horizon end**
|
|||
|
||||
### Key mechanics
|
||||
|
||||
- **Funds**: start at $250K. Monthly payroll is deducted automatically. Task rewards scale with prestige (`base × (1 + 0.55 × (prestige − 1))`).
|
||||
- **Funds**: starting capital varies by preset ($80K–$250K). Monthly payroll is deducted automatically. Task rewards scale with prestige (`base × (1 + scale × (prestige − 1))`).
|
||||
- **4 domains**: `research · inference · data/environment · training`. Each domain tracks prestige independently in [1.0, 10.0].
|
||||
- **Prestige gating**: tasks require a minimum prestige level. Most tasks need prestige 3–5, so the agent must climb from 1.0 by completing easier tasks first. First 10 market tasks are stratified `[1,1,1,1,2,2,2,3,3,4]` to bootstrap progression.
|
||||
- **Per-domain prestige gating**: a task's required prestige is checked against **each** of its required domains. The agent must climb prestige broadly, not just in one domain.
|
||||
- **Prestige decay**: every domain loses prestige daily. Neglected domains decay back toward 1.0. The agent must stay active across domains to maintain market access.
|
||||
- **Prestige-scaled work volume**: higher-prestige tasks require proportionally more work. Higher prestige pays more but demands more capacity.
|
||||
- **Employees**: 10 employees across 3 tiers (junior/mid/senior). The agent sees only each employee's tier and salary — not their per-domain skill rates. A junior can secretly be a superstar in one domain, so the agent must infer productivity from task progress observations.
|
||||
- **Throughput splitting**: an employee assigned to N active tasks has `effective_rate = base_rate / N`. Focus beats breadth.
|
||||
- **Task success**: on-time completion awards funds + prestige + skill boosts + 1% salary bump (compounding payroll pressure). Late completion penalises prestige (1.4×). Cancellation penalises harder (2.0×).
|
||||
- **Task success**: on-time completion awards funds + prestige + skill boosts + 1% salary bump (compounding payroll pressure). Late completion penalises prestige. Cancellation penalises harder.
|
||||
- **Progress checkpoints**: the agent is woken at 25%, 50%, 75%, and 100% completion — providing data points to estimate employee productivity.
|
||||
- **Scratchpad**: persistent notes in the DB that survive context truncation (only last 20 conversation rounds are kept).
|
||||
|
||||
|
|
@ -92,7 +94,7 @@ yc-bench report monthly # P&L per month
|
|||
yc-bench task accept --task-id UUID # pull from market
|
||||
yc-bench task assign --task-id UUID --employee-id UUID
|
||||
yc-bench task dispatch --task-id UUID # start work
|
||||
yc-bench task cancel --task-id UUID --reason "" # cancel (2× prestige penalty)
|
||||
yc-bench task cancel --task-id UUID --reason "" # cancel (prestige penalty)
|
||||
yc-bench sim resume # advance time
|
||||
yc-bench scratchpad write/append/clear # persistent memory
|
||||
```
|
||||
|
|
@ -103,13 +105,15 @@ yc-bench scratchpad write/append/clear # persistent memory
|
|||
|
||||
Experiment presets live in `src/yc_bench/config/presets/` as TOML files. Pass the preset name via `--config`.
|
||||
|
||||
| Config | Employees | Tasks | Tests |
|
||||
|--------|-----------|-------|-------|
|
||||
| **tutorial** | 3 | 50 | Basic accept→assign→dispatch loop |
|
||||
| **easy** | 5 | 100 | Throughput awareness |
|
||||
| **medium** | 5 | 150 | Prestige climbing + domain specialization |
|
||||
| **hard** | 7 | 200 | Precise ETA reasoning |
|
||||
| **nightmare** | 8 | 300 | Sustained perfection under compounding payroll |
|
||||
All presets use 10 employees and 200 market tasks. Difficulty comes from deadline pressure, penalty severity, prestige distribution, and task size.
|
||||
|
||||
| Config | Deadline pressure | Prestige mode | What it tests |
|
||||
|--------|------------------|---------------|---------------|
|
||||
| **tutorial** | Very relaxed | 1 | Basic accept→assign→dispatch loop |
|
||||
| **easy** | Relaxed | 1 | Throughput awareness |
|
||||
| **medium** | Moderate | 3 | Prestige climbing + domain specialization |
|
||||
| **hard** | Tight | 4 | Precise ETA reasoning + capacity planning |
|
||||
| **nightmare** | Razor-thin | 5 | Sustained perfection under compounding payroll |
|
||||
|
||||
See `default.toml` for the full list of tunable parameters.
|
||||
|
||||
|
|
@ -117,44 +121,7 @@ See `default.toml` for the full list of tunable parameters.
|
|||
|
||||
## Benchmark results
|
||||
|
||||
### Sonnet 4.6 vs Gemini 3 Flash vs GPT-5.2 — 1-year horizon, 3 seeds per config
|
||||
|
||||

|
||||
|
||||
#### Survival rates
|
||||
|
||||
| Config | Sonnet 4.6 | Gemini 3 Flash | GPT-5.2 |
|
||||
|--------|-----------|----------------|---------|
|
||||
| **medium** | 3/3 | 3/3 | 3/3 |
|
||||
| **hard** | 1/3 | 2/3 | 2/3 |
|
||||
| **nightmare** | 1/3 | 3/3 | 2/3 |
|
||||
|
||||
#### Final funds (bankrupt = funds < 0)
|
||||
|
||||
| Config | Seed | Sonnet 4.6 | Gemini 3 Flash | GPT-5.2 |
|
||||
|--------|------|-----------|----------------|---------|
|
||||
| medium | 1 | **$9.1M** | **$9.5M** | **$1.8M** |
|
||||
| medium | 2 | **$6.1M** | **$11.0M** | **$321K** |
|
||||
| medium | 3 | **$107K** | **$15.8M** | **$28K** |
|
||||
| hard | 1 | bankrupt | bankrupt | bankrupt |
|
||||
| hard | 2 | **$63K** | **$412K** | **$15.7M** |
|
||||
| hard | 3 | bankrupt | **$21.9M** | **$43.5M** |
|
||||
| nightmare | 1 | bankrupt | **$2.1M** | bankrupt |
|
||||
| nightmare | 2 | **$10.1M** | **$214K** | **$2.2M** |
|
||||
| nightmare | 3 | bankrupt | **$805K** | **$23.6M** |
|
||||
|
||||
**Overall: Gemini 8/9 · GPT-5.2 7/9 · Sonnet 5/9**
|
||||
|
||||
#### Key findings
|
||||
|
||||
- **Gemini leads on consistency** (8/9 survival). The only model to sweep all 3 nightmare seeds.
|
||||
- **GPT-5.2 has the highest ceiling.** Hard seed 3: $43.5M vs Gemini's $21.9M. When it survives, it tends to outperform by a wide margin.
|
||||
- **Sonnet is high-variance.** Nightmare seed 2: $10.1M (best nightmare result), but 4/9 bankruptcies overall.
|
||||
- **Win rate predicts survival.** Every run with >58% task win rate survived. Every run below 40% went bankrupt.
|
||||
|
||||
#### Prestige specialization
|
||||
|
||||

|
||||
*Results pending — re-running benchmarks with updated economics.*
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
BIN
plots/hard_1_gpt-5.4_funds.png
Normal file
BIN
plots/hard_1_gpt-5.4_funds.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 127 KiB |
BIN
plots/hard_1_gpt-5.4_prestige.png
Normal file
BIN
plots/hard_1_gpt-5.4_prestige.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 191 KiB |
File diff suppressed because it is too large
Load diff
5235
results/yc_bench_result_medium_1_openai_gpt-5.4.json
Normal file
5235
results/yc_bench_result_medium_1_openai_gpt-5.4.json
Normal file
File diff suppressed because it is too large
Load diff
|
|
@ -346,6 +346,7 @@ def run_bot(config_name: str, seed: int, bot_slug: str, strategy_fn: StrategyFn)
|
|||
replacement = generate_replacement_task(
|
||||
run_seed=sim_state.run_seed,
|
||||
replenish_counter=counter,
|
||||
replaced_prestige=best_task.required_prestige,
|
||||
cfg=world_cfg,
|
||||
)
|
||||
replacement_row = Task(
|
||||
|
|
|
|||
|
|
@ -26,7 +26,7 @@ DEFAULT_RUNS = [
|
|||
{"label": "kimi-k2.5", "model_slug": "openrouter_moonshotai_kimi-k2.5", "color": "#2ecc71"},
|
||||
]
|
||||
|
||||
INITIAL_FUNDS_CENTS = 25_000_000 # $250K
|
||||
INITIAL_FUNDS_CENTS = 15_000_000 # $150K (default; presets may override)
|
||||
|
||||
|
||||
def parse_args():
|
||||
|
|
@ -129,7 +129,7 @@ def make_plot(run_data, seed, config_name, budget_usd, out_path: Path):
|
|||
|
||||
# ── Funds curves ─────────────────────────────────────────────────────────
|
||||
ax_funds.axhline(0, color="#e74c3c", linewidth=0.9, linestyle="--", alpha=0.4, zorder=1)
|
||||
ax_funds.axhline(250_000, color="#555577", linewidth=0.7, linestyle=":", alpha=0.6, zorder=1)
|
||||
ax_funds.axhline(INITIAL_FUNDS_CENTS / 100, color="#555577", linewidth=0.7, linestyle=":", alpha=0.6, zorder=1)
|
||||
|
||||
for r in run_data:
|
||||
if not r["times"]:
|
||||
|
|
|
|||
|
|
@ -88,6 +88,7 @@ def task_accept(
|
|||
replacement = generate_replacement_task(
|
||||
run_seed=sim_state.run_seed,
|
||||
replenish_counter=counter,
|
||||
replaced_prestige=task.required_prestige,
|
||||
cfg=_get_world_cfg(),
|
||||
)
|
||||
|
||||
|
|
|
|||
|
|
@ -53,7 +53,7 @@ company_name = "BenchCo"
|
|||
|
||||
[world]
|
||||
num_employees = 10
|
||||
initial_funds_cents = 25_000_000 # $250,000
|
||||
initial_funds_cents = 15_000_000 # $150,000
|
||||
initial_prestige_level = 1.0
|
||||
work_hours_per_day = 9.0
|
||||
|
||||
|
|
@ -78,9 +78,9 @@ penalty_cancel_multiplier = 2.0 # hardened: was 1.2
|
|||
reward_prestige_scale = 0.55 # hardened: was 0.3
|
||||
|
||||
# Daily prestige decay per domain. Domains not exercised lose prestige
|
||||
# over time: -0.01/day → -0.3/month. Untouched domain drops ~1 level
|
||||
# every ~3 months. Prevents single-domain hyper-specialization.
|
||||
prestige_decay_per_day = 0.01
|
||||
# over time: -0.005/day → -0.15/month. Untouched domain drops ~1 level
|
||||
# every ~6 months. Prevents single-domain hyper-specialization.
|
||||
prestige_decay_per_day = 0.005
|
||||
|
||||
# Required qty scaling by prestige: qty *= 1 + scale * (prestige - 1).
|
||||
# At 0.3: prestige-5 tasks need 2.2x the work of prestige-1 tasks.
|
||||
|
|
@ -90,7 +90,7 @@ prestige_qty_scale = 0.3
|
|||
# --- Deadline ---
|
||||
# Deadline = max(deadline_min_biz_days, max_domain_qty / deadline_qty_per_day).
|
||||
# Domains are worked in parallel, so deadline scales with heaviest domain, not sum.
|
||||
deadline_qty_per_day = 150.0
|
||||
deadline_qty_per_day = 200.0
|
||||
deadline_min_biz_days = 7
|
||||
|
||||
# --- Progress milestones (checkpoint events at these completion fractions) ---
|
||||
|
|
@ -120,12 +120,12 @@ high = 10
|
|||
mode = 4 # hardened: base default is mode=1
|
||||
|
||||
# Base reward paid on task completion, in cents (scaled further by prestige).
|
||||
# Higher-prestige tasks automatically pay more via reward_prestige_scale.
|
||||
# Mode $14K: prestige-1 tasks burn cash, prestige-3 breaks even, prestige-4+ profits.
|
||||
[world.dist.reward_funds_cents]
|
||||
type = "triangular"
|
||||
low = 500_000 # $5,000
|
||||
high = 10_000_000 # $100,000
|
||||
mode = 3_000_000 # $30,000
|
||||
low = 300_000 # $3,000
|
||||
high = 4_000_000 # $40,000
|
||||
mode = 1_400_000 # $14,000
|
||||
|
||||
# Number of domains each task requires work in (cast to int after sampling).
|
||||
# mode=2: most tasks need 2 domains — single-specialist dominance gone.
|
||||
|
|
@ -139,9 +139,9 @@ mode = 2 # hardened: base default is mode=1
|
|||
# No trivially-small tasks: every task requires sustained employee-hours.
|
||||
[world.dist.required_qty]
|
||||
type = "triangular"
|
||||
low = 500 # hardened: base default is 200
|
||||
high = 3000
|
||||
mode = 1400 # hardened: base default is 800
|
||||
low = 800 # hardened: base default is 200
|
||||
high = 4000
|
||||
mode = 2000 # hardened: base default is 800
|
||||
|
||||
# Prestige delta awarded per domain on task success.
|
||||
# Mean ~0.1: climbing prestige 1→5 takes ~40 tasks.
|
||||
|
|
|
|||
|
|
@ -28,10 +28,11 @@ horizon_years = 1
|
|||
auto_advance_after_turns = 8
|
||||
|
||||
[world]
|
||||
initial_funds_cents = 20_000_000 # $200,000
|
||||
# Inherits num_employees=10, num_market_tasks=200 from default.
|
||||
|
||||
# Moderate deadlines: 60 qty/day → ~12 day deadline. Comfortable with 3–4 tasks.
|
||||
deadline_qty_per_day = 60.0
|
||||
# Moderate deadlines: 100 qty/day → 10-day deadline for mode task.
|
||||
deadline_qty_per_day = 100.0
|
||||
|
||||
# Original (un-hardened) penalties — costly but not catastrophic.
|
||||
penalty_fail_multiplier = 0.8
|
||||
|
|
@ -55,6 +56,6 @@ value = 1 # Single-domain — the test is about throughput, not assignmen
|
|||
|
||||
[world.dist.required_qty]
|
||||
type = "triangular"
|
||||
low = 300
|
||||
high = 1500
|
||||
mode = 700 # Moderate size — a few days of focused work each.
|
||||
low = 500
|
||||
high = 2000
|
||||
mode = 1000 # Larger tasks — must stay focused, no excessive parallelism.
|
||||
|
|
|
|||
|
|
@ -40,12 +40,13 @@ horizon_years = 1
|
|||
auto_advance_after_turns = 10
|
||||
|
||||
[world]
|
||||
initial_funds_cents = 10_000_000 # $100,000 — must reach prestige 3 by month 5
|
||||
# Inherits num_employees=10, num_market_tasks=200 from default.
|
||||
|
||||
# Tight deadlines: 1200/150 = 8 days.
|
||||
# 1 task with 5 per domain → 5.8 days. OK.
|
||||
# 2 concurrent tasks → 11.6 days. Miss.
|
||||
deadline_qty_per_day = 150.0
|
||||
# Tight deadlines: 2000/220 = 9.1 days.
|
||||
# 1 task with 5 per domain → 8.7 days. Just fits.
|
||||
# 2 concurrent tasks → 17.4 days. Guaranteed miss.
|
||||
deadline_qty_per_day = 220.0
|
||||
|
||||
# Stiff penalties — mistakes cost real prestige.
|
||||
penalty_fail_multiplier = 1.4
|
||||
|
|
@ -71,6 +72,6 @@ mode = 2 # Most tasks need 2 domains.
|
|||
|
||||
[world.dist.required_qty]
|
||||
type = "triangular"
|
||||
low = 500
|
||||
high = 2500
|
||||
mode = 1200 # Large tasks — require sustained focus.
|
||||
low = 1000
|
||||
high = 4000
|
||||
mode = 2000 # Large tasks — each takes ~9 days with full team. No parallelism.
|
||||
|
|
|
|||
|
|
@ -38,10 +38,10 @@ auto_advance_after_turns = 8
|
|||
[world]
|
||||
# Inherits num_employees=10, num_market_tasks=200 from default.
|
||||
|
||||
# Deadline uses max per-domain qty. 900/100 = 9 days.
|
||||
# 2 concurrent tasks: 5 per task → 4.3 days each. Manageable.
|
||||
# 3 concurrent tasks: 3.3 per task → 6.6 days. Risky.
|
||||
deadline_qty_per_day = 100.0
|
||||
# Deadline uses max per-domain qty. 1500/150 = 10 days.
|
||||
# 1 task with 5 per domain → 6.5 days. Comfortable.
|
||||
# 2 concurrent tasks → 13 days. Miss.
|
||||
deadline_qty_per_day = 150.0
|
||||
|
||||
# Real penalties — failing costs prestige, cancelling costs more.
|
||||
penalty_fail_multiplier = 1.0
|
||||
|
|
@ -67,6 +67,6 @@ mode = 2 # Most tasks need 2 domains.
|
|||
|
||||
[world.dist.required_qty]
|
||||
type = "triangular"
|
||||
low = 400
|
||||
high = 2000
|
||||
mode = 900 # Moderate work — completable in 7–12 days with focus.
|
||||
low = 700
|
||||
high = 3000
|
||||
mode = 1500 # Larger tasks — ~6.5 days with full team, no parallelism.
|
||||
|
|
|
|||
|
|
@ -49,12 +49,13 @@ horizon_years = 1
|
|||
auto_advance_after_turns = 10
|
||||
|
||||
[world]
|
||||
initial_funds_cents = 8_000_000 # $80,000 — razor-thin runway
|
||||
# Inherits num_employees=10, num_market_tasks=200 from default.
|
||||
|
||||
# Razor deadlines: 1600/200 = 8 days.
|
||||
# 1 task with 5 per domain → 7.7 days. Barely makes it.
|
||||
# 2 concurrent tasks → guaranteed miss.
|
||||
deadline_qty_per_day = 200.0
|
||||
# Razor deadlines: 2500/220 = 11.4 days.
|
||||
# 1 task with 5 per domain → 10.9 days. Barely fits.
|
||||
# 2 concurrent tasks → 21.8 days. Guaranteed miss.
|
||||
deadline_qty_per_day = 220.0
|
||||
|
||||
# Catastrophic penalties — there is no good exit from a bad accept.
|
||||
penalty_fail_multiplier = 2.0
|
||||
|
|
@ -81,9 +82,9 @@ mode = 2 # Mostly 2-domain, some 3-domain.
|
|||
|
||||
[world.dist.required_qty]
|
||||
type = "triangular"
|
||||
low = 600
|
||||
high = 3000
|
||||
mode = 1600 # Large work volumes — no quick wins.
|
||||
low = 1200
|
||||
high = 5000
|
||||
mode = 2500 # Massive work volumes — each task consumes the full team.
|
||||
|
||||
# Slightly larger prestige gains than default (~0.13 avg) to make
|
||||
# climbing feasible despite the steep penalty. But one blown task
|
||||
|
|
|
|||
|
|
@ -28,10 +28,11 @@ horizon_years = 1
|
|||
auto_advance_after_turns = 5
|
||||
|
||||
[world]
|
||||
initial_funds_cents = 25_000_000 # $250,000 — very forgiving buffer
|
||||
# Inherits num_employees=10, num_market_tasks=200 from default.
|
||||
|
||||
# Very generous deadlines: 30 qty/day → most tasks get 13+ day deadline.
|
||||
deadline_qty_per_day = 30.0
|
||||
# Generous deadlines: 50 qty/day → mode task gets 12-day deadline.
|
||||
deadline_qty_per_day = 50.0
|
||||
|
||||
# Negligible penalties — mistakes barely hurt.
|
||||
penalty_fail_multiplier = 0.3
|
||||
|
|
@ -53,6 +54,6 @@ value = 1 # ALL tasks single-domain — trivial assignment.
|
|||
|
||||
[world.dist.required_qty]
|
||||
type = "triangular"
|
||||
low = 200
|
||||
high = 800
|
||||
mode = 400 # Small tasks, quick completions.
|
||||
low = 300
|
||||
high = 1200
|
||||
mode = 600 # Moderate tasks, comfortable with focused execution.
|
||||
|
|
|
|||
|
|
@ -39,7 +39,7 @@ class WorldDists(BaseModel):
|
|||
)
|
||||
# Base reward paid on task completion, in cents (result cast to int).
|
||||
reward_funds_cents: DistSpec = Field(
|
||||
default_factory=lambda: TriangularDist(low=500_000, high=10_000_000, mode=3_000_000)
|
||||
default_factory=lambda: TriangularDist(low=300_000, high=4_000_000, mode=1_400_000)
|
||||
)
|
||||
# Number of domains required per task (result cast to int).
|
||||
domain_count: DistSpec = Field(
|
||||
|
|
@ -105,7 +105,7 @@ class SimConfig(BaseModel):
|
|||
class WorldConfig(BaseModel):
|
||||
# --- Workforce ---
|
||||
num_employees: int = 10
|
||||
initial_funds_cents: int = 25_000_000 # $250,000
|
||||
initial_funds_cents: int = 15_000_000 # $150,000
|
||||
initial_prestige_level: float = 1.0
|
||||
work_hours_per_day: float = 9.0
|
||||
|
||||
|
|
@ -128,7 +128,7 @@ class WorldConfig(BaseModel):
|
|||
# Daily prestige decay per domain. Domains not exercised lose prestige
|
||||
# over time: -0.01/day → -0.3/month → untouched domain drops ~1 level
|
||||
# every ~3 months. Floored at prestige_min.
|
||||
prestige_decay_per_day: float = 0.01
|
||||
prestige_decay_per_day: float = 0.005
|
||||
|
||||
# Required qty scaling by prestige: qty *= 1 + prestige_qty_scale * (prestige - 1).
|
||||
# At 0.3: prestige-5 tasks need 2.2× the work of prestige-1 tasks.
|
||||
|
|
|
|||
|
|
@ -178,6 +178,13 @@ def advance_time(
|
|||
result.payrolls_applied += 1
|
||||
payroll_idx += 1
|
||||
|
||||
# Report payroll as a wake event so the agent gets control back
|
||||
company = db.query(Company).filter(Company.id == company_id).one()
|
||||
result.wake_events.append({
|
||||
"type": "monthly_payroll",
|
||||
"funds_after": company.funds_cents,
|
||||
})
|
||||
|
||||
if bankrupt:
|
||||
# Insert bankruptcy event at this time
|
||||
insert_event(
|
||||
|
|
@ -188,7 +195,9 @@ def advance_time(
|
|||
dedupe_key=f"bankruptcy:{current_time.isoformat()}",
|
||||
)
|
||||
result.bankrupt = True
|
||||
break
|
||||
|
||||
# Always stop at payroll — gives the agent a chance to act
|
||||
break
|
||||
|
||||
elif action_type == "event":
|
||||
event_result = dispatch_event(db, next_event, current_time, company_id)
|
||||
|
|
|
|||
|
|
@ -87,7 +87,7 @@ class Task(Base):
|
|||
class TaskRequirement(Base):
|
||||
__tablename__ = "task_requirements"
|
||||
__table_args__ = (
|
||||
CheckConstraint("required_qty >= 200 AND required_qty <= 3000", name="ck_task_requirements_required_qty_range"),
|
||||
CheckConstraint("required_qty >= 200 AND required_qty <= 25000", name="ck_task_requirements_required_qty_range"),
|
||||
CheckConstraint("completed_qty >= 0", name="ck_task_requirements_completed_qty_gte_0"),
|
||||
CheckConstraint("completed_qty <= required_qty", name="ck_task_requirements_completed_qty_lte_required_qty"),
|
||||
)
|
||||
|
|
|
|||
|
|
@ -27,10 +27,9 @@ class GeneratedTask:
|
|||
requirements: dict[str, int]
|
||||
|
||||
|
||||
# First 10 market tasks are given explicit prestige values to guarantee a
|
||||
# climbable ladder from the start (avoids runs where all early tasks need
|
||||
# prestige 4+ before any are completable).
|
||||
_STRATIFIED_PRESTIGE = [1, 1, 1, 1, 2, 2, 2, 3, 3, 4]
|
||||
# First 10 market tasks are forced to prestige 1 to guarantee a
|
||||
# bootstrapping path regardless of the prestige distribution.
|
||||
_STRATIFIED_PRESTIGE = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
|
||||
|
||||
_ALL_DOMAINS = list(Domain)
|
||||
|
||||
|
|
@ -134,14 +133,14 @@ def build_task_rows(*, run_seed, count, cfg=None):
|
|||
return task_rows, requirement_rows
|
||||
|
||||
|
||||
def generate_replacement_task(*, run_seed, replenish_counter, cfg=None):
|
||||
def generate_replacement_task(*, run_seed, replenish_counter, replaced_prestige, cfg=None):
|
||||
"""Generate a replacement task with the same prestige as the accepted task."""
|
||||
if cfg is None:
|
||||
cfg = WorldConfig()
|
||||
streams = RngStreams(run_seed)
|
||||
rng = streams.stream(f"replenish_{replenish_counter}")
|
||||
prestige = _sample_required_prestige(rng, cfg)
|
||||
requirements = _sample_requirements(rng, cfg, prestige=prestige)
|
||||
return _make_task(rng, cfg, prestige, serial=replenish_counter, requirements=requirements)
|
||||
requirements = _sample_requirements(rng, cfg, prestige=replaced_prestige)
|
||||
return _make_task(rng, cfg, replaced_prestige, serial=replenish_counter, requirements=requirements)
|
||||
|
||||
|
||||
__all__ = [
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue