Comprehensive documentation covering all major subsystems: simulation engine, data models, task system, prestige, finances, employees, agent layer, CLI interface, configuration, and runner. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5.4 KiB
CLI Interface
Location: src/yc_bench/cli/
Overview
The CLI is the agent's sole interface to the simulation. Every command returns structured JSON, enabling reliable parsing by LLMs.
Design Choices
JSON-Only Output
All CLI commands return JSON, never free-text:
$ yc-bench company status
{
"company_name": "Nexus AI",
"funds": "$150,000.00",
"funds_cents": 15000000,
"monthly_payroll": "$30,000.00",
"runway_months": 5.0,
"prestige": {
"research": 3.5,
"inference": 2.1,
"data_environment": 1.0,
"training": 4.2
}
}
Why JSON?
- Unambiguous parsing by LLMs (vs. formatted tables)
- Consistent structure across all commands
- Easy to pipe into
python_replfor analysis - Machine-readable without regex or text parsing
Command Group Organization
| Group | File | Purpose |
|---|---|---|
company |
company_commands.py |
Company status, prestige overview |
employee |
employee_commands.py |
Employee listing and details |
market |
market_commands.py |
Browse available tasks |
task |
task_commands.py |
Task lifecycle (accept/assign/dispatch/cancel/inspect/list) |
sim |
sim_commands.py |
Simulation control (resume) |
finance |
finance_commands.py |
Ledger queries |
report |
report_commands.py |
Monthly P&L reports |
scratchpad |
scratchpad_commands.py |
Persistent agent memory |
Design choice: Command groups mirror real business functions (operations, HR, finance, strategy). This makes the interface intuitive for LLM agents that have been trained on business concepts.
Command Details
Company Commands
company status
Returns current funds, payroll, runway, and prestige levels per domain.
Design choice: Single command gives the agent a complete financial and strategic snapshot. Reduces the number of API calls needed per decision cycle.
Employee Commands
employee list
Returns all employees with tier, salary, and current active task count.
Design choice: Shows active task count but NOT skill rates. The agent must infer capabilities.
Market Commands
market browse [--domain X] [--min-prestige N] [--max-prestige N] [--offset O] [--limit L]
Browse available market tasks with optional filters.
Design choice: Filtering and pagination prevent information overload. The agent can focus on tasks matching its current prestige level and strategic goals.
Task Commands
task accept <task_id>
Accept a market task. Validates prestige requirements. Sets deadline.
task assign <task_id> <employee_id>
Assign an employee to a planned/active task. Recalculates ETAs.
task dispatch <task_id>
Start work on a planned task. Changes status to active.
task cancel <task_id>
Cancel a task. Applies prestige penalty. Frees employees.
task inspect <task_id>
Detailed view of a single task: requirements, progress, assignments, deadline.
task list [--status X]
List company tasks with optional status filter.
Design choice: The accept → assign → dispatch flow gives the agent explicit control over each phase. This mirrors real project management where you scope, staff, and then kick off work.
Simulation Commands
sim resume
Advance simulation to the next event. Returns wake events.
{
"advanced_to": "2025-02-15T09:00:00",
"wake_events": [
{"type": "task_completed", "task_id": "...", "success": true},
{"type": "payroll", "amount": -3000000}
]
}
Design choice: Resume is the only way to advance time. The agent explicitly chooses when to move forward, creating natural decision checkpoints.
Finance Commands
finance ledger [--category X] [--from DATE] [--to DATE] [--offset O] [--limit L]
Query the immutable transaction history.
Design choice: Full ledger access lets sophisticated agents analyze spending patterns and project future cash flow.
Report Commands
report monthly
Aggregated P&L by month.
Design choice: Monthly reports provide a higher-level financial view than raw ledger entries, useful for strategic planning.
Scratchpad Commands
scratchpad read
Read persistent notes.
scratchpad write <content>
Overwrite scratchpad contents.
scratchpad append <content>
Add to existing scratchpad.
scratchpad clear
Clear scratchpad.
Design choice: The scratchpad is critical for long simulations where LLM context gets truncated. The agent can store:
- Employee capability observations
- Strategic plans
- Financial projections
- Task priority lists
This compensates for context window limitations and tests whether the agent proactively maintains external memory.
Error Handling
All commands return structured errors:
{
"error": "Insufficient prestige in research (have 2.3, need 4.0)"
}
Design choice: Descriptive error messages help the agent understand what went wrong and adjust its strategy, rather than failing silently or with cryptic messages.
CLI Entry Point (__main__.py)
The CLI uses a command-line parser (likely Click or argparse) to route commands to handler functions. Each handler:
- Opens a database session
- Validates inputs
- Performs the operation
- Returns JSON output
- Commits or rolls back the transaction
Design choice: Each CLI call is a self-contained transaction. This prevents partial state updates and ensures the simulation remains consistent.