mirror of
https://github.com/collinear-ai/yc-bench.git
synced 2026-04-19 12:58:03 +00:00
Merge pull request #7 from collinear-ai/feat/employee_tiers
Feat/employee tiers
This commit is contained in:
commit
b1cd7ebfb2
19 changed files with 7244 additions and 5336 deletions
BIN
.DS_Store
vendored
BIN
.DS_Store
vendored
Binary file not shown.
69
README.md
69
README.md
|
|
@ -56,8 +56,8 @@ bash scripts/run_benchmark.sh --seed 1 --config hard
|
||||||
|
|
||||||
### Core loop
|
### Core loop
|
||||||
|
|
||||||
1. Agent calls `yc-bench sim resume` to advance time to the next event.
|
1. Agent calls `yc-bench sim resume` to advance time to the next event or monthly payroll.
|
||||||
2. The engine flushes task progress, fires due events, applies payroll.
|
2. The engine flushes task progress, applies prestige decay, fires due events, applies payroll.
|
||||||
3. Agent reads wake events and decides: accept tasks, assign employees, dispatch, cancel.
|
3. Agent reads wake events and decides: accept tasks, assign employees, dispatch, cancel.
|
||||||
4. Repeat until bankruptcy or horizon end.
|
4. Repeat until bankruptcy or horizon end.
|
||||||
|
|
||||||
|
|
@ -65,12 +65,14 @@ The simulation ends on **bankruptcy** (funds < 0 after payroll), **horizon end**
|
||||||
|
|
||||||
### Key mechanics
|
### Key mechanics
|
||||||
|
|
||||||
- **Funds**: start at $250K. Monthly payroll is deducted automatically. Task rewards scale with prestige (`base × (1 + 0.55 × (prestige − 1))`).
|
- **Funds**: starting capital varies by preset ($80K–$250K). Monthly payroll is deducted automatically. Task rewards scale with prestige (`base × (1 + scale × (prestige − 1))`).
|
||||||
- **4 domains**: `research · inference · data/environment · training`. Each domain tracks prestige independently in [1.0, 10.0].
|
- **4 domains**: `research · inference · data/environment · training`. Each domain tracks prestige independently in [1.0, 10.0].
|
||||||
- **Prestige gating**: tasks require a minimum prestige level. Most tasks need prestige 3–5, so the agent must climb from 1.0 by completing easier tasks first. First 10 market tasks are stratified `[1,1,1,1,2,2,2,3,3,4]` to bootstrap progression.
|
- **Per-domain prestige gating**: a task's required prestige is checked against **each** of its required domains. The agent must climb prestige broadly, not just in one domain.
|
||||||
|
- **Prestige decay**: every domain loses prestige daily. Neglected domains decay back toward 1.0. The agent must stay active across domains to maintain market access.
|
||||||
|
- **Prestige-scaled work volume**: higher-prestige tasks require proportionally more work. Higher prestige pays more but demands more capacity.
|
||||||
- **Employees**: 10 employees across 3 tiers (junior/mid/senior). The agent sees only each employee's tier and salary — not their per-domain skill rates. A junior can secretly be a superstar in one domain, so the agent must infer productivity from task progress observations.
|
- **Employees**: 10 employees across 3 tiers (junior/mid/senior). The agent sees only each employee's tier and salary — not their per-domain skill rates. A junior can secretly be a superstar in one domain, so the agent must infer productivity from task progress observations.
|
||||||
- **Throughput splitting**: an employee assigned to N active tasks has `effective_rate = base_rate / N`. Focus beats breadth.
|
- **Throughput splitting**: an employee assigned to N active tasks has `effective_rate = base_rate / N`. Focus beats breadth.
|
||||||
- **Task success**: on-time completion awards funds + prestige + skill boosts + 1% salary bump (compounding payroll pressure). Late completion penalises prestige (1.4×). Cancellation penalises harder (2.0×).
|
- **Task success**: on-time completion awards funds + prestige + skill boosts + 1% salary bump (compounding payroll pressure). Late completion penalises prestige. Cancellation penalises harder.
|
||||||
- **Progress checkpoints**: the agent is woken at 25%, 50%, 75%, and 100% completion — providing data points to estimate employee productivity.
|
- **Progress checkpoints**: the agent is woken at 25%, 50%, 75%, and 100% completion — providing data points to estimate employee productivity.
|
||||||
- **Scratchpad**: persistent notes in the DB that survive context truncation (only last 20 conversation rounds are kept).
|
- **Scratchpad**: persistent notes in the DB that survive context truncation (only last 20 conversation rounds are kept).
|
||||||
|
|
||||||
|
|
@ -92,7 +94,7 @@ yc-bench report monthly # P&L per month
|
||||||
yc-bench task accept --task-id UUID # pull from market
|
yc-bench task accept --task-id UUID # pull from market
|
||||||
yc-bench task assign --task-id UUID --employee-id UUID
|
yc-bench task assign --task-id UUID --employee-id UUID
|
||||||
yc-bench task dispatch --task-id UUID # start work
|
yc-bench task dispatch --task-id UUID # start work
|
||||||
yc-bench task cancel --task-id UUID --reason "" # cancel (2× prestige penalty)
|
yc-bench task cancel --task-id UUID --reason "" # cancel (prestige penalty)
|
||||||
yc-bench sim resume # advance time
|
yc-bench sim resume # advance time
|
||||||
yc-bench scratchpad write/append/clear # persistent memory
|
yc-bench scratchpad write/append/clear # persistent memory
|
||||||
```
|
```
|
||||||
|
|
@ -103,13 +105,15 @@ yc-bench scratchpad write/append/clear # persistent memory
|
||||||
|
|
||||||
Experiment presets live in `src/yc_bench/config/presets/` as TOML files. Pass the preset name via `--config`.
|
Experiment presets live in `src/yc_bench/config/presets/` as TOML files. Pass the preset name via `--config`.
|
||||||
|
|
||||||
| Config | Employees | Tasks | Tests |
|
All presets use 10 employees and 200 market tasks. Difficulty comes from deadline pressure, penalty severity, prestige distribution, and task size.
|
||||||
|--------|-----------|-------|-------|
|
|
||||||
| **tutorial** | 3 | 50 | Basic accept→assign→dispatch loop |
|
| Config | Deadline pressure | Prestige mode | What it tests |
|
||||||
| **easy** | 5 | 100 | Throughput awareness |
|
|--------|------------------|---------------|---------------|
|
||||||
| **medium** | 5 | 150 | Prestige climbing + domain specialization |
|
| **tutorial** | Very relaxed | 1 | Basic accept→assign→dispatch loop |
|
||||||
| **hard** | 7 | 200 | Precise ETA reasoning |
|
| **easy** | Relaxed | 1 | Throughput awareness |
|
||||||
| **nightmare** | 8 | 300 | Sustained perfection under compounding payroll |
|
| **medium** | Moderate | 3 | Prestige climbing + domain specialization |
|
||||||
|
| **hard** | Tight | 4 | Precise ETA reasoning + capacity planning |
|
||||||
|
| **nightmare** | Razor-thin | 5 | Sustained perfection under compounding payroll |
|
||||||
|
|
||||||
See `default.toml` for the full list of tunable parameters.
|
See `default.toml` for the full list of tunable parameters.
|
||||||
|
|
||||||
|
|
@ -117,44 +121,7 @@ See `default.toml` for the full list of tunable parameters.
|
||||||
|
|
||||||
## Benchmark results
|
## Benchmark results
|
||||||
|
|
||||||
### Sonnet 4.6 vs Gemini 3 Flash vs GPT-5.2 — 1-year horizon, 3 seeds per config
|
*Results pending — re-running benchmarks with updated economics.*
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
#### Survival rates
|
|
||||||
|
|
||||||
| Config | Sonnet 4.6 | Gemini 3 Flash | GPT-5.2 |
|
|
||||||
|--------|-----------|----------------|---------|
|
|
||||||
| **medium** | 3/3 | 3/3 | 3/3 |
|
|
||||||
| **hard** | 1/3 | 2/3 | 2/3 |
|
|
||||||
| **nightmare** | 1/3 | 3/3 | 2/3 |
|
|
||||||
|
|
||||||
#### Final funds (bankrupt = funds < 0)
|
|
||||||
|
|
||||||
| Config | Seed | Sonnet 4.6 | Gemini 3 Flash | GPT-5.2 |
|
|
||||||
|--------|------|-----------|----------------|---------|
|
|
||||||
| medium | 1 | **$9.1M** | **$9.5M** | **$1.8M** |
|
|
||||||
| medium | 2 | **$6.1M** | **$11.0M** | **$321K** |
|
|
||||||
| medium | 3 | **$107K** | **$15.8M** | **$28K** |
|
|
||||||
| hard | 1 | bankrupt | bankrupt | bankrupt |
|
|
||||||
| hard | 2 | **$63K** | **$412K** | **$15.7M** |
|
|
||||||
| hard | 3 | bankrupt | **$21.9M** | **$43.5M** |
|
|
||||||
| nightmare | 1 | bankrupt | **$2.1M** | bankrupt |
|
|
||||||
| nightmare | 2 | **$10.1M** | **$214K** | **$2.2M** |
|
|
||||||
| nightmare | 3 | bankrupt | **$805K** | **$23.6M** |
|
|
||||||
|
|
||||||
**Overall: Gemini 8/9 · GPT-5.2 7/9 · Sonnet 5/9**
|
|
||||||
|
|
||||||
#### Key findings
|
|
||||||
|
|
||||||
- **Gemini leads on consistency** (8/9 survival). The only model to sweep all 3 nightmare seeds.
|
|
||||||
- **GPT-5.2 has the highest ceiling.** Hard seed 3: $43.5M vs Gemini's $21.9M. When it survives, it tends to outperform by a wide margin.
|
|
||||||
- **Sonnet is high-variance.** Nightmare seed 2: $10.1M (best nightmare result), but 4/9 bankruptcies overall.
|
|
||||||
- **Win rate predicts survival.** Every run with >58% task win rate survived. Every run below 40% went bankrupt.
|
|
||||||
|
|
||||||
#### Prestige specialization
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
|
||||||
BIN
plots/hard_1_gpt-5.4_funds.png
Normal file
BIN
plots/hard_1_gpt-5.4_funds.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 127 KiB |
BIN
plots/hard_1_gpt-5.4_prestige.png
Normal file
BIN
plots/hard_1_gpt-5.4_prestige.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 191 KiB |
File diff suppressed because it is too large
Load diff
5235
results/yc_bench_result_medium_1_openai_gpt-5.4.json
Normal file
5235
results/yc_bench_result_medium_1_openai_gpt-5.4.json
Normal file
File diff suppressed because it is too large
Load diff
|
|
@ -346,6 +346,7 @@ def run_bot(config_name: str, seed: int, bot_slug: str, strategy_fn: StrategyFn)
|
||||||
replacement = generate_replacement_task(
|
replacement = generate_replacement_task(
|
||||||
run_seed=sim_state.run_seed,
|
run_seed=sim_state.run_seed,
|
||||||
replenish_counter=counter,
|
replenish_counter=counter,
|
||||||
|
replaced_prestige=best_task.required_prestige,
|
||||||
cfg=world_cfg,
|
cfg=world_cfg,
|
||||||
)
|
)
|
||||||
replacement_row = Task(
|
replacement_row = Task(
|
||||||
|
|
|
||||||
|
|
@ -26,7 +26,7 @@ DEFAULT_RUNS = [
|
||||||
{"label": "kimi-k2.5", "model_slug": "openrouter_moonshotai_kimi-k2.5", "color": "#2ecc71"},
|
{"label": "kimi-k2.5", "model_slug": "openrouter_moonshotai_kimi-k2.5", "color": "#2ecc71"},
|
||||||
]
|
]
|
||||||
|
|
||||||
INITIAL_FUNDS_CENTS = 25_000_000 # $250K
|
INITIAL_FUNDS_CENTS = 15_000_000 # $150K (default; presets may override)
|
||||||
|
|
||||||
|
|
||||||
def parse_args():
|
def parse_args():
|
||||||
|
|
@ -129,7 +129,7 @@ def make_plot(run_data, seed, config_name, budget_usd, out_path: Path):
|
||||||
|
|
||||||
# ── Funds curves ─────────────────────────────────────────────────────────
|
# ── Funds curves ─────────────────────────────────────────────────────────
|
||||||
ax_funds.axhline(0, color="#e74c3c", linewidth=0.9, linestyle="--", alpha=0.4, zorder=1)
|
ax_funds.axhline(0, color="#e74c3c", linewidth=0.9, linestyle="--", alpha=0.4, zorder=1)
|
||||||
ax_funds.axhline(250_000, color="#555577", linewidth=0.7, linestyle=":", alpha=0.6, zorder=1)
|
ax_funds.axhline(INITIAL_FUNDS_CENTS / 100, color="#555577", linewidth=0.7, linestyle=":", alpha=0.6, zorder=1)
|
||||||
|
|
||||||
for r in run_data:
|
for r in run_data:
|
||||||
if not r["times"]:
|
if not r["times"]:
|
||||||
|
|
|
||||||
|
|
@ -88,6 +88,7 @@ def task_accept(
|
||||||
replacement = generate_replacement_task(
|
replacement = generate_replacement_task(
|
||||||
run_seed=sim_state.run_seed,
|
run_seed=sim_state.run_seed,
|
||||||
replenish_counter=counter,
|
replenish_counter=counter,
|
||||||
|
replaced_prestige=task.required_prestige,
|
||||||
cfg=_get_world_cfg(),
|
cfg=_get_world_cfg(),
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -53,7 +53,7 @@ company_name = "BenchCo"
|
||||||
|
|
||||||
[world]
|
[world]
|
||||||
num_employees = 10
|
num_employees = 10
|
||||||
initial_funds_cents = 25_000_000 # $250,000
|
initial_funds_cents = 15_000_000 # $150,000
|
||||||
initial_prestige_level = 1.0
|
initial_prestige_level = 1.0
|
||||||
work_hours_per_day = 9.0
|
work_hours_per_day = 9.0
|
||||||
|
|
||||||
|
|
@ -78,9 +78,9 @@ penalty_cancel_multiplier = 2.0 # hardened: was 1.2
|
||||||
reward_prestige_scale = 0.55 # hardened: was 0.3
|
reward_prestige_scale = 0.55 # hardened: was 0.3
|
||||||
|
|
||||||
# Daily prestige decay per domain. Domains not exercised lose prestige
|
# Daily prestige decay per domain. Domains not exercised lose prestige
|
||||||
# over time: -0.01/day → -0.3/month. Untouched domain drops ~1 level
|
# over time: -0.005/day → -0.15/month. Untouched domain drops ~1 level
|
||||||
# every ~3 months. Prevents single-domain hyper-specialization.
|
# every ~6 months. Prevents single-domain hyper-specialization.
|
||||||
prestige_decay_per_day = 0.01
|
prestige_decay_per_day = 0.005
|
||||||
|
|
||||||
# Required qty scaling by prestige: qty *= 1 + scale * (prestige - 1).
|
# Required qty scaling by prestige: qty *= 1 + scale * (prestige - 1).
|
||||||
# At 0.3: prestige-5 tasks need 2.2x the work of prestige-1 tasks.
|
# At 0.3: prestige-5 tasks need 2.2x the work of prestige-1 tasks.
|
||||||
|
|
@ -90,7 +90,7 @@ prestige_qty_scale = 0.3
|
||||||
# --- Deadline ---
|
# --- Deadline ---
|
||||||
# Deadline = max(deadline_min_biz_days, max_domain_qty / deadline_qty_per_day).
|
# Deadline = max(deadline_min_biz_days, max_domain_qty / deadline_qty_per_day).
|
||||||
# Domains are worked in parallel, so deadline scales with heaviest domain, not sum.
|
# Domains are worked in parallel, so deadline scales with heaviest domain, not sum.
|
||||||
deadline_qty_per_day = 150.0
|
deadline_qty_per_day = 200.0
|
||||||
deadline_min_biz_days = 7
|
deadline_min_biz_days = 7
|
||||||
|
|
||||||
# --- Progress milestones (checkpoint events at these completion fractions) ---
|
# --- Progress milestones (checkpoint events at these completion fractions) ---
|
||||||
|
|
@ -120,12 +120,12 @@ high = 10
|
||||||
mode = 4 # hardened: base default is mode=1
|
mode = 4 # hardened: base default is mode=1
|
||||||
|
|
||||||
# Base reward paid on task completion, in cents (scaled further by prestige).
|
# Base reward paid on task completion, in cents (scaled further by prestige).
|
||||||
# Higher-prestige tasks automatically pay more via reward_prestige_scale.
|
# Mode $14K: prestige-1 tasks burn cash, prestige-3 breaks even, prestige-4+ profits.
|
||||||
[world.dist.reward_funds_cents]
|
[world.dist.reward_funds_cents]
|
||||||
type = "triangular"
|
type = "triangular"
|
||||||
low = 500_000 # $5,000
|
low = 300_000 # $3,000
|
||||||
high = 10_000_000 # $100,000
|
high = 4_000_000 # $40,000
|
||||||
mode = 3_000_000 # $30,000
|
mode = 1_400_000 # $14,000
|
||||||
|
|
||||||
# Number of domains each task requires work in (cast to int after sampling).
|
# Number of domains each task requires work in (cast to int after sampling).
|
||||||
# mode=2: most tasks need 2 domains — single-specialist dominance gone.
|
# mode=2: most tasks need 2 domains — single-specialist dominance gone.
|
||||||
|
|
@ -139,9 +139,9 @@ mode = 2 # hardened: base default is mode=1
|
||||||
# No trivially-small tasks: every task requires sustained employee-hours.
|
# No trivially-small tasks: every task requires sustained employee-hours.
|
||||||
[world.dist.required_qty]
|
[world.dist.required_qty]
|
||||||
type = "triangular"
|
type = "triangular"
|
||||||
low = 500 # hardened: base default is 200
|
low = 800 # hardened: base default is 200
|
||||||
high = 3000
|
high = 4000
|
||||||
mode = 1400 # hardened: base default is 800
|
mode = 2000 # hardened: base default is 800
|
||||||
|
|
||||||
# Prestige delta awarded per domain on task success.
|
# Prestige delta awarded per domain on task success.
|
||||||
# Mean ~0.1: climbing prestige 1→5 takes ~40 tasks.
|
# Mean ~0.1: climbing prestige 1→5 takes ~40 tasks.
|
||||||
|
|
|
||||||
|
|
@ -28,10 +28,11 @@ horizon_years = 1
|
||||||
auto_advance_after_turns = 8
|
auto_advance_after_turns = 8
|
||||||
|
|
||||||
[world]
|
[world]
|
||||||
|
initial_funds_cents = 20_000_000 # $200,000
|
||||||
# Inherits num_employees=10, num_market_tasks=200 from default.
|
# Inherits num_employees=10, num_market_tasks=200 from default.
|
||||||
|
|
||||||
# Moderate deadlines: 60 qty/day → ~12 day deadline. Comfortable with 3–4 tasks.
|
# Moderate deadlines: 100 qty/day → 10-day deadline for mode task.
|
||||||
deadline_qty_per_day = 60.0
|
deadline_qty_per_day = 100.0
|
||||||
|
|
||||||
# Original (un-hardened) penalties — costly but not catastrophic.
|
# Original (un-hardened) penalties — costly but not catastrophic.
|
||||||
penalty_fail_multiplier = 0.8
|
penalty_fail_multiplier = 0.8
|
||||||
|
|
@ -55,6 +56,6 @@ value = 1 # Single-domain — the test is about throughput, not assignmen
|
||||||
|
|
||||||
[world.dist.required_qty]
|
[world.dist.required_qty]
|
||||||
type = "triangular"
|
type = "triangular"
|
||||||
low = 300
|
low = 500
|
||||||
high = 1500
|
high = 2000
|
||||||
mode = 700 # Moderate size — a few days of focused work each.
|
mode = 1000 # Larger tasks — must stay focused, no excessive parallelism.
|
||||||
|
|
|
||||||
|
|
@ -40,12 +40,13 @@ horizon_years = 1
|
||||||
auto_advance_after_turns = 10
|
auto_advance_after_turns = 10
|
||||||
|
|
||||||
[world]
|
[world]
|
||||||
|
initial_funds_cents = 10_000_000 # $100,000 — must reach prestige 3 by month 5
|
||||||
# Inherits num_employees=10, num_market_tasks=200 from default.
|
# Inherits num_employees=10, num_market_tasks=200 from default.
|
||||||
|
|
||||||
# Tight deadlines: 1200/150 = 8 days.
|
# Tight deadlines: 2000/220 = 9.1 days.
|
||||||
# 1 task with 5 per domain → 5.8 days. OK.
|
# 1 task with 5 per domain → 8.7 days. Just fits.
|
||||||
# 2 concurrent tasks → 11.6 days. Miss.
|
# 2 concurrent tasks → 17.4 days. Guaranteed miss.
|
||||||
deadline_qty_per_day = 150.0
|
deadline_qty_per_day = 220.0
|
||||||
|
|
||||||
# Stiff penalties — mistakes cost real prestige.
|
# Stiff penalties — mistakes cost real prestige.
|
||||||
penalty_fail_multiplier = 1.4
|
penalty_fail_multiplier = 1.4
|
||||||
|
|
@ -71,6 +72,6 @@ mode = 2 # Most tasks need 2 domains.
|
||||||
|
|
||||||
[world.dist.required_qty]
|
[world.dist.required_qty]
|
||||||
type = "triangular"
|
type = "triangular"
|
||||||
low = 500
|
low = 1000
|
||||||
high = 2500
|
high = 4000
|
||||||
mode = 1200 # Large tasks — require sustained focus.
|
mode = 2000 # Large tasks — each takes ~9 days with full team. No parallelism.
|
||||||
|
|
|
||||||
|
|
@ -38,10 +38,10 @@ auto_advance_after_turns = 8
|
||||||
[world]
|
[world]
|
||||||
# Inherits num_employees=10, num_market_tasks=200 from default.
|
# Inherits num_employees=10, num_market_tasks=200 from default.
|
||||||
|
|
||||||
# Deadline uses max per-domain qty. 900/100 = 9 days.
|
# Deadline uses max per-domain qty. 1500/150 = 10 days.
|
||||||
# 2 concurrent tasks: 5 per task → 4.3 days each. Manageable.
|
# 1 task with 5 per domain → 6.5 days. Comfortable.
|
||||||
# 3 concurrent tasks: 3.3 per task → 6.6 days. Risky.
|
# 2 concurrent tasks → 13 days. Miss.
|
||||||
deadline_qty_per_day = 100.0
|
deadline_qty_per_day = 150.0
|
||||||
|
|
||||||
# Real penalties — failing costs prestige, cancelling costs more.
|
# Real penalties — failing costs prestige, cancelling costs more.
|
||||||
penalty_fail_multiplier = 1.0
|
penalty_fail_multiplier = 1.0
|
||||||
|
|
@ -67,6 +67,6 @@ mode = 2 # Most tasks need 2 domains.
|
||||||
|
|
||||||
[world.dist.required_qty]
|
[world.dist.required_qty]
|
||||||
type = "triangular"
|
type = "triangular"
|
||||||
low = 400
|
low = 700
|
||||||
high = 2000
|
high = 3000
|
||||||
mode = 900 # Moderate work — completable in 7–12 days with focus.
|
mode = 1500 # Larger tasks — ~6.5 days with full team, no parallelism.
|
||||||
|
|
|
||||||
|
|
@ -49,12 +49,13 @@ horizon_years = 1
|
||||||
auto_advance_after_turns = 10
|
auto_advance_after_turns = 10
|
||||||
|
|
||||||
[world]
|
[world]
|
||||||
|
initial_funds_cents = 8_000_000 # $80,000 — razor-thin runway
|
||||||
# Inherits num_employees=10, num_market_tasks=200 from default.
|
# Inherits num_employees=10, num_market_tasks=200 from default.
|
||||||
|
|
||||||
# Razor deadlines: 1600/200 = 8 days.
|
# Razor deadlines: 2500/220 = 11.4 days.
|
||||||
# 1 task with 5 per domain → 7.7 days. Barely makes it.
|
# 1 task with 5 per domain → 10.9 days. Barely fits.
|
||||||
# 2 concurrent tasks → guaranteed miss.
|
# 2 concurrent tasks → 21.8 days. Guaranteed miss.
|
||||||
deadline_qty_per_day = 200.0
|
deadline_qty_per_day = 220.0
|
||||||
|
|
||||||
# Catastrophic penalties — there is no good exit from a bad accept.
|
# Catastrophic penalties — there is no good exit from a bad accept.
|
||||||
penalty_fail_multiplier = 2.0
|
penalty_fail_multiplier = 2.0
|
||||||
|
|
@ -81,9 +82,9 @@ mode = 2 # Mostly 2-domain, some 3-domain.
|
||||||
|
|
||||||
[world.dist.required_qty]
|
[world.dist.required_qty]
|
||||||
type = "triangular"
|
type = "triangular"
|
||||||
low = 600
|
low = 1200
|
||||||
high = 3000
|
high = 5000
|
||||||
mode = 1600 # Large work volumes — no quick wins.
|
mode = 2500 # Massive work volumes — each task consumes the full team.
|
||||||
|
|
||||||
# Slightly larger prestige gains than default (~0.13 avg) to make
|
# Slightly larger prestige gains than default (~0.13 avg) to make
|
||||||
# climbing feasible despite the steep penalty. But one blown task
|
# climbing feasible despite the steep penalty. But one blown task
|
||||||
|
|
|
||||||
|
|
@ -28,10 +28,11 @@ horizon_years = 1
|
||||||
auto_advance_after_turns = 5
|
auto_advance_after_turns = 5
|
||||||
|
|
||||||
[world]
|
[world]
|
||||||
|
initial_funds_cents = 25_000_000 # $250,000 — very forgiving buffer
|
||||||
# Inherits num_employees=10, num_market_tasks=200 from default.
|
# Inherits num_employees=10, num_market_tasks=200 from default.
|
||||||
|
|
||||||
# Very generous deadlines: 30 qty/day → most tasks get 13+ day deadline.
|
# Generous deadlines: 50 qty/day → mode task gets 12-day deadline.
|
||||||
deadline_qty_per_day = 30.0
|
deadline_qty_per_day = 50.0
|
||||||
|
|
||||||
# Negligible penalties — mistakes barely hurt.
|
# Negligible penalties — mistakes barely hurt.
|
||||||
penalty_fail_multiplier = 0.3
|
penalty_fail_multiplier = 0.3
|
||||||
|
|
@ -53,6 +54,6 @@ value = 1 # ALL tasks single-domain — trivial assignment.
|
||||||
|
|
||||||
[world.dist.required_qty]
|
[world.dist.required_qty]
|
||||||
type = "triangular"
|
type = "triangular"
|
||||||
low = 200
|
low = 300
|
||||||
high = 800
|
high = 1200
|
||||||
mode = 400 # Small tasks, quick completions.
|
mode = 600 # Moderate tasks, comfortable with focused execution.
|
||||||
|
|
|
||||||
|
|
@ -39,7 +39,7 @@ class WorldDists(BaseModel):
|
||||||
)
|
)
|
||||||
# Base reward paid on task completion, in cents (result cast to int).
|
# Base reward paid on task completion, in cents (result cast to int).
|
||||||
reward_funds_cents: DistSpec = Field(
|
reward_funds_cents: DistSpec = Field(
|
||||||
default_factory=lambda: TriangularDist(low=500_000, high=10_000_000, mode=3_000_000)
|
default_factory=lambda: TriangularDist(low=300_000, high=4_000_000, mode=1_400_000)
|
||||||
)
|
)
|
||||||
# Number of domains required per task (result cast to int).
|
# Number of domains required per task (result cast to int).
|
||||||
domain_count: DistSpec = Field(
|
domain_count: DistSpec = Field(
|
||||||
|
|
@ -105,7 +105,7 @@ class SimConfig(BaseModel):
|
||||||
class WorldConfig(BaseModel):
|
class WorldConfig(BaseModel):
|
||||||
# --- Workforce ---
|
# --- Workforce ---
|
||||||
num_employees: int = 10
|
num_employees: int = 10
|
||||||
initial_funds_cents: int = 25_000_000 # $250,000
|
initial_funds_cents: int = 15_000_000 # $150,000
|
||||||
initial_prestige_level: float = 1.0
|
initial_prestige_level: float = 1.0
|
||||||
work_hours_per_day: float = 9.0
|
work_hours_per_day: float = 9.0
|
||||||
|
|
||||||
|
|
@ -128,7 +128,7 @@ class WorldConfig(BaseModel):
|
||||||
# Daily prestige decay per domain. Domains not exercised lose prestige
|
# Daily prestige decay per domain. Domains not exercised lose prestige
|
||||||
# over time: -0.01/day → -0.3/month → untouched domain drops ~1 level
|
# over time: -0.01/day → -0.3/month → untouched domain drops ~1 level
|
||||||
# every ~3 months. Floored at prestige_min.
|
# every ~3 months. Floored at prestige_min.
|
||||||
prestige_decay_per_day: float = 0.01
|
prestige_decay_per_day: float = 0.005
|
||||||
|
|
||||||
# Required qty scaling by prestige: qty *= 1 + prestige_qty_scale * (prestige - 1).
|
# Required qty scaling by prestige: qty *= 1 + prestige_qty_scale * (prestige - 1).
|
||||||
# At 0.3: prestige-5 tasks need 2.2× the work of prestige-1 tasks.
|
# At 0.3: prestige-5 tasks need 2.2× the work of prestige-1 tasks.
|
||||||
|
|
|
||||||
|
|
@ -178,6 +178,13 @@ def advance_time(
|
||||||
result.payrolls_applied += 1
|
result.payrolls_applied += 1
|
||||||
payroll_idx += 1
|
payroll_idx += 1
|
||||||
|
|
||||||
|
# Report payroll as a wake event so the agent gets control back
|
||||||
|
company = db.query(Company).filter(Company.id == company_id).one()
|
||||||
|
result.wake_events.append({
|
||||||
|
"type": "monthly_payroll",
|
||||||
|
"funds_after": company.funds_cents,
|
||||||
|
})
|
||||||
|
|
||||||
if bankrupt:
|
if bankrupt:
|
||||||
# Insert bankruptcy event at this time
|
# Insert bankruptcy event at this time
|
||||||
insert_event(
|
insert_event(
|
||||||
|
|
@ -188,7 +195,9 @@ def advance_time(
|
||||||
dedupe_key=f"bankruptcy:{current_time.isoformat()}",
|
dedupe_key=f"bankruptcy:{current_time.isoformat()}",
|
||||||
)
|
)
|
||||||
result.bankrupt = True
|
result.bankrupt = True
|
||||||
break
|
|
||||||
|
# Always stop at payroll — gives the agent a chance to act
|
||||||
|
break
|
||||||
|
|
||||||
elif action_type == "event":
|
elif action_type == "event":
|
||||||
event_result = dispatch_event(db, next_event, current_time, company_id)
|
event_result = dispatch_event(db, next_event, current_time, company_id)
|
||||||
|
|
|
||||||
|
|
@ -87,7 +87,7 @@ class Task(Base):
|
||||||
class TaskRequirement(Base):
|
class TaskRequirement(Base):
|
||||||
__tablename__ = "task_requirements"
|
__tablename__ = "task_requirements"
|
||||||
__table_args__ = (
|
__table_args__ = (
|
||||||
CheckConstraint("required_qty >= 200 AND required_qty <= 3000", name="ck_task_requirements_required_qty_range"),
|
CheckConstraint("required_qty >= 200 AND required_qty <= 25000", name="ck_task_requirements_required_qty_range"),
|
||||||
CheckConstraint("completed_qty >= 0", name="ck_task_requirements_completed_qty_gte_0"),
|
CheckConstraint("completed_qty >= 0", name="ck_task_requirements_completed_qty_gte_0"),
|
||||||
CheckConstraint("completed_qty <= required_qty", name="ck_task_requirements_completed_qty_lte_required_qty"),
|
CheckConstraint("completed_qty <= required_qty", name="ck_task_requirements_completed_qty_lte_required_qty"),
|
||||||
)
|
)
|
||||||
|
|
|
||||||
|
|
@ -27,10 +27,9 @@ class GeneratedTask:
|
||||||
requirements: dict[str, int]
|
requirements: dict[str, int]
|
||||||
|
|
||||||
|
|
||||||
# First 10 market tasks are given explicit prestige values to guarantee a
|
# First 10 market tasks are forced to prestige 1 to guarantee a
|
||||||
# climbable ladder from the start (avoids runs where all early tasks need
|
# bootstrapping path regardless of the prestige distribution.
|
||||||
# prestige 4+ before any are completable).
|
_STRATIFIED_PRESTIGE = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
|
||||||
_STRATIFIED_PRESTIGE = [1, 1, 1, 1, 2, 2, 2, 3, 3, 4]
|
|
||||||
|
|
||||||
_ALL_DOMAINS = list(Domain)
|
_ALL_DOMAINS = list(Domain)
|
||||||
|
|
||||||
|
|
@ -134,14 +133,14 @@ def build_task_rows(*, run_seed, count, cfg=None):
|
||||||
return task_rows, requirement_rows
|
return task_rows, requirement_rows
|
||||||
|
|
||||||
|
|
||||||
def generate_replacement_task(*, run_seed, replenish_counter, cfg=None):
|
def generate_replacement_task(*, run_seed, replenish_counter, replaced_prestige, cfg=None):
|
||||||
|
"""Generate a replacement task with the same prestige as the accepted task."""
|
||||||
if cfg is None:
|
if cfg is None:
|
||||||
cfg = WorldConfig()
|
cfg = WorldConfig()
|
||||||
streams = RngStreams(run_seed)
|
streams = RngStreams(run_seed)
|
||||||
rng = streams.stream(f"replenish_{replenish_counter}")
|
rng = streams.stream(f"replenish_{replenish_counter}")
|
||||||
prestige = _sample_required_prestige(rng, cfg)
|
requirements = _sample_requirements(rng, cfg, prestige=replaced_prestige)
|
||||||
requirements = _sample_requirements(rng, cfg, prestige=prestige)
|
return _make_task(rng, cfg, replaced_prestige, serial=replenish_counter, requirements=requirements)
|
||||||
return _make_task(rng, cfg, prestige, serial=replenish_counter, requirements=requirements)
|
|
||||||
|
|
||||||
|
|
||||||
__all__ = [
|
__all__ = [
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue