yc-bench/system_design/05_financial_model.md
2026-03-19 18:39:57 -07:00

164 lines
5.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Financial Model
**Location**: `src/yc_bench/db/models/ledger.py`, `src/yc_bench/cli/finance_commands.py`, `src/yc_bench/cli/report_commands.py`, `src/yc_bench/core/handlers/`
## Overview
The financial model simulates a startup's cash flow: revenue from completed tasks, costs from employee payroll, and penalties for failures. Running out of money triggers bankruptcy and ends the simulation.
## Design Choices
### Cents-Based Integer Arithmetic
All financial values are stored as `BigInteger` in cents:
```
$1,000.00 = 100_000 cents
```
**Why cents?** Floating-point arithmetic introduces rounding errors that compound over hundreds of transactions. Integer cents guarantee exact financial accounting -- critical for a deterministic benchmark.
### Immutable Append-Only Ledger
Every financial transaction creates a `LedgerEntry` that is never modified or deleted:
```python
class LedgerEntry:
category: MONTHLY_PAYROLL | TASK_REWARD | TASK_FAIL_PENALTY | TASK_CANCEL_PENALTY
amount_cents: int # negative for costs, positive for revenue
occurred_at: datetime
ref_type: str # optional reference to source entity
ref_id: UUID # optional reference ID
```
**Why immutable?** An append-only ledger provides:
- Complete audit trail for debugging
- Ability to reconstruct balance at any point in time
- No risk of silent data corruption
- Natural fit for the `finance ledger` and `report monthly` CLI commands
## Revenue Sources
### Task Rewards
On successful (on-time) completion:
```
reward = base_reward × (1 + prestige_scale × (avg_prestige - 1))
```
Where `avg_prestige` is averaged across the task's required domains. Higher prestige = higher payouts.
**Design choice**: Prestige-scaled rewards create a positive feedback loop that mirrors real business dynamics -- reputation leads to better opportunities.
### Revenue Timing
Rewards are credited immediately upon task completion (when the `task_completed` event fires with `success=True`).
## Cost Sources
### Monthly Payroll
Payroll is deducted on the **first business day** of each month:
```
total_payroll = sum(employee.salary_cents for all employees)
```
**Design choice**: Monthly payroll creates predictable but unavoidable costs. The agent must maintain positive cash flow to cover it.
### Salary Bumps
Each completed task increases salaries by a fixed per-tier amount (linear, not compounding):
```
tier_midpoints = {junior: (min+max)/2, mid: (min+max)/2, senior: (min+max)/2}
for each assigned employee:
bump = tier_midpoints[employee.tier] * salary_bump_pct
salary_cents += bump # fixed amount per task
```
**Design choice**: Linear salary bumps create steady payroll growth without exponential compounding. A junior gets ~$30/task bump, mid ~$70, senior ~$125 (at 1% of tier midpoint). This avoids runaway payroll in the late game while still creating pressure.
### Failure Penalties
Late task completion incurs no direct financial penalty beyond the missed reward opportunity. However, the prestige loss from failure reduces future reward scaling.
### Cancel Penalties
Cancellation may incur a financial penalty depending on configuration (some presets charge a fraction of the reward).
## Payroll-Event Tie-Breaking
When payroll and events fall on the same timestamp:
```
Payroll is processed BEFORE events
```
**Design choice**: This ordering is critical. If a task completes on the same day as payroll:
1. Payroll deducts first (may push funds negative)
2. Task completion reward credits (may save from bankruptcy)
3. Bankruptcy check happens after both
This gives the agent the benefit of the doubt -- a task completing on payday can save the company.
## Bankruptcy
Bankruptcy triggers when `funds_cents < 0` after payroll processing:
```python
if company.funds_cents < 0:
insert_bankruptcy_event(session, company_id, sim_time)
```
**Design choice**: Bankruptcy is checked only after payroll (not after penalties). This simplifies the model and makes payroll the primary survival constraint.
### Bankruptcy as Terminal State
Once bankruptcy fires, the simulation ends. There is no recovery mechanic.
**Why no bailout?** The benchmark tests whether the agent can sustainably manage a business. Allowing recovery would dilute this signal.
## Financial Reports
### Ledger Query (`finance ledger`)
The agent can query the full transaction history with filters:
- Category filter
- Date range filter
- Pagination
### Monthly P&L (`report monthly`)
Aggregates transactions by month:
```
Month Revenue Payroll Penalties Net
2025-01 $50,000 $30,000 $0 $20,000
2025-02 $35,000 $30,300 $5,000 -$300
```
**Design choice**: Structured financial reporting gives the agent the data it needs to make informed decisions about task selection and resource allocation.
## Runway Calculation
The `company status` command includes a runway estimate:
```
runway_months = funds_cents / monthly_payroll_cents
```
This helps the agent gauge urgency. Low runway signals that the agent needs profitable tasks quickly.
## Difficulty Scaling
Financial pressure scales with difficulty preset:
| Preset | Initial Funds | Payroll Pressure | Penalties |
|--------|--------------|-----------------|-----------|
| tutorial | Very high | Low | Minimal |
| easy | High | Moderate | Low |
| medium | Moderate | Moderate | Standard |
| hard | Low | High | 1.5x |
| nightmare | Very low | Very high | 2x |