5.7 KiB
CLI Interface
Location: src/yc_bench/cli/
Overview
The CLI is the agent's sole interface to the simulation. Every command returns structured JSON, enabling reliable parsing by LLMs.
Design Choices
JSON-Only Output
All CLI commands return JSON, never free-text:
$ yc-bench company status
{
"company_name": "Nexus AI",
"funds": "$150,000.00",
"funds_cents": 15000000,
"monthly_payroll": "$30,000.00",
"runway_months": 5.0,
"prestige": {
"research": 3.5,
"inference": 2.1,
"data_environment": 1.0,
"training": 4.2
}
}
Why JSON?
- Unambiguous parsing by LLMs (vs. formatted tables)
- Consistent structure across all commands
- Easy to pipe into
python_replfor analysis - Machine-readable without regex or text parsing
Command Group Organization
| Group | File | Purpose |
|---|---|---|
company |
company_commands.py |
Company status, prestige overview |
employee |
employee_commands.py |
Employee listing and details |
market |
market_commands.py |
Browse available tasks |
task |
task_commands.py |
Task lifecycle (accept/assign/dispatch/cancel/inspect/list) |
sim |
sim_commands.py |
Simulation control (resume) |
finance |
finance_commands.py |
Ledger queries |
report |
report_commands.py |
Monthly P&L reports |
scratchpad |
scratchpad_commands.py |
Persistent agent memory |
Design choice: Command groups mirror real business functions (operations, HR, finance, strategy). This makes the interface intuitive for LLM agents that have been trained on business concepts.
Command Details
Company Commands
company status
Returns current funds, payroll, runway, and prestige levels per domain.
Design choice: Single command gives the agent a complete financial and strategic snapshot. Reduces the number of API calls needed per decision cycle.
Employee Commands
employee list
Returns all employees with tier, salary, and current active task count.
Design choice: Shows active task count but NOT skill rates. The agent must infer capabilities.
Market Commands
market browse [--domain X] [--reward-min-cents N] [--offset O] [--limit L]
Browse available market tasks with optional filters. Results are capped at market_browse_default_limit (default 50) per page.
The browse auto-filters by prestige and trust: only tasks the company can actually accept are shown. This means:
- Per-domain prestige check: all required domains must meet the task's
required_prestige - Trust check: company must have sufficient trust with the task's client
Design choice: Auto-filtering prevents the agent from wasting turns trying to accept inaccessible tasks. Pagination (--offset) allows browsing beyond the first page.
Task Commands
task accept <task_id>
Accept a market task. Validates prestige requirements. Sets deadline.
task assign <task_id> <employee_id>
Assign an employee to a planned/active task. Recalculates ETAs.
task dispatch <task_id>
Start work on a planned task. Changes status to active.
task cancel <task_id>
Cancel a task. Applies prestige penalty. Frees employees.
task inspect <task_id>
Detailed view of a single task: requirements, progress, assignments, deadline.
task list [--status X]
List company tasks with optional status filter.
Design choice: The accept → assign → dispatch flow gives the agent explicit control over each phase. This mirrors real project management where you scope, staff, and then kick off work.
Simulation Commands
sim resume
Advance simulation to the next event. Returns wake events.
{
"advanced_to": "2025-02-15T09:00:00",
"wake_events": [
{"type": "task_completed", "task_id": "...", "success": true},
{"type": "payroll", "amount": -3000000}
]
}
Design choice: Resume is the only way to advance time. The agent explicitly chooses when to move forward, creating natural decision checkpoints.
Finance Commands
finance ledger [--category X] [--from DATE] [--to DATE] [--offset O] [--limit L]
Query the immutable transaction history.
Design choice: Full ledger access lets sophisticated agents analyze spending patterns and project future cash flow.
Report Commands
report monthly
Aggregated P&L by month.
Design choice: Monthly reports provide a higher-level financial view than raw ledger entries, useful for strategic planning.
Scratchpad Commands
scratchpad read
Read persistent notes.
scratchpad write <content>
Overwrite scratchpad contents.
scratchpad append <content>
Add to existing scratchpad.
scratchpad clear
Clear scratchpad.
Design choice: The scratchpad is critical for long simulations where LLM context gets truncated. The agent can store:
- Employee capability observations
- Strategic plans
- Financial projections
- Task priority lists
This compensates for context window limitations and tests whether the agent proactively maintains external memory.
Error Handling
All commands return structured errors:
{
"error": "Insufficient prestige in research (have 2.3, need 4.0)"
}
Design choice: Descriptive error messages help the agent understand what went wrong and adjust its strategy, rather than failing silently or with cryptic messages.
CLI Entry Point (__main__.py)
The CLI uses a command-line parser (likely Click or argparse) to route commands to handler functions. Each handler:
- Opens a database session
- Validates inputs
- Performs the operation
- Returns JSON output
- Commits or rolls back the transaction
Design choice: Each CLI call is a self-contained transaction. This prevents partial state updates and ensures the simulation remains consistent.