yc-bench/system_design/08_cli_interface.md
2026-03-19 18:39:57 -07:00

177 lines
5.7 KiB
Markdown

# CLI Interface
**Location**: `src/yc_bench/cli/`
## Overview
The CLI is the agent's sole interface to the simulation. Every command returns structured JSON, enabling reliable parsing by LLMs.
## Design Choices
### JSON-Only Output
All CLI commands return JSON, never free-text:
```bash
$ yc-bench company status
{
"company_name": "Nexus AI",
"funds": "$150,000.00",
"funds_cents": 15000000,
"monthly_payroll": "$30,000.00",
"runway_months": 5.0,
"prestige": {
"research": 3.5,
"inference": 2.1,
"data_environment": 1.0,
"training": 4.2
}
}
```
**Why JSON?**
- Unambiguous parsing by LLMs (vs. formatted tables)
- Consistent structure across all commands
- Easy to pipe into `python_repl` for analysis
- Machine-readable without regex or text parsing
### Command Group Organization
| Group | File | Purpose |
|-------|------|---------|
| `company` | `company_commands.py` | Company status, prestige overview |
| `employee` | `employee_commands.py` | Employee listing and details |
| `market` | `market_commands.py` | Browse available tasks |
| `task` | `task_commands.py` | Task lifecycle (accept/assign/dispatch/cancel/inspect/list) |
| `sim` | `sim_commands.py` | Simulation control (resume) |
| `finance` | `finance_commands.py` | Ledger queries |
| `report` | `report_commands.py` | Monthly P&L reports |
| `scratchpad` | `scratchpad_commands.py` | Persistent agent memory |
**Design choice**: Command groups mirror real business functions (operations, HR, finance, strategy). This makes the interface intuitive for LLM agents that have been trained on business concepts.
## Command Details
### Company Commands
#### `company status`
Returns current funds, payroll, runway, and prestige levels per domain.
**Design choice**: Single command gives the agent a complete financial and strategic snapshot. Reduces the number of API calls needed per decision cycle.
### Employee Commands
#### `employee list`
Returns all employees with tier, salary, and current active task count.
**Design choice**: Shows active task count but NOT skill rates. The agent must infer capabilities.
### Market Commands
#### `market browse [--domain X] [--reward-min-cents N] [--offset O] [--limit L]`
Browse available market tasks with optional filters. Results are capped at `market_browse_default_limit` (default 50) per page.
The browse **auto-filters** by prestige and trust: only tasks the company can actually accept are shown. This means:
- Per-domain prestige check: all required domains must meet the task's `required_prestige`
- Trust check: company must have sufficient trust with the task's client
**Design choice**: Auto-filtering prevents the agent from wasting turns trying to accept inaccessible tasks. Pagination (`--offset`) allows browsing beyond the first page.
### Task Commands
#### `task accept <task_id>`
Accept a market task. Validates prestige requirements. Sets deadline.
#### `task assign <task_id> <employee_id>`
Assign an employee to a planned/active task. Recalculates ETAs.
#### `task dispatch <task_id>`
Start work on a planned task. Changes status to active.
#### `task cancel <task_id>`
Cancel a task. Applies prestige penalty. Frees employees.
#### `task inspect <task_id>`
Detailed view of a single task: requirements, progress, assignments, deadline.
#### `task list [--status X]`
List company tasks with optional status filter.
**Design choice**: The accept → assign → dispatch flow gives the agent explicit control over each phase. This mirrors real project management where you scope, staff, and then kick off work.
### Simulation Commands
#### `sim resume`
Advance simulation to the next event. Returns wake events.
```json
{
"advanced_to": "2025-02-15T09:00:00",
"wake_events": [
{"type": "task_completed", "task_id": "...", "success": true},
{"type": "payroll", "amount": -3000000}
]
}
```
**Design choice**: Resume is the only way to advance time. The agent explicitly chooses when to move forward, creating natural decision checkpoints.
### Finance Commands
#### `finance ledger [--category X] [--from DATE] [--to DATE] [--offset O] [--limit L]`
Query the immutable transaction history.
**Design choice**: Full ledger access lets sophisticated agents analyze spending patterns and project future cash flow.
### Report Commands
#### `report monthly`
Aggregated P&L by month.
**Design choice**: Monthly reports provide a higher-level financial view than raw ledger entries, useful for strategic planning.
### Scratchpad Commands
#### `scratchpad read`
Read persistent notes.
#### `scratchpad write <content>`
Overwrite scratchpad contents.
#### `scratchpad append <content>`
Add to existing scratchpad.
#### `scratchpad clear`
Clear scratchpad.
**Design choice**: The scratchpad is critical for long simulations where LLM context gets truncated. The agent can store:
- Employee capability observations
- Strategic plans
- Financial projections
- Task priority lists
This compensates for context window limitations and tests whether the agent proactively maintains external memory.
## Error Handling
All commands return structured errors:
```json
{
"error": "Insufficient prestige in research (have 2.3, need 4.0)"
}
```
**Design choice**: Descriptive error messages help the agent understand what went wrong and adjust its strategy, rather than failing silently or with cryptic messages.
## CLI Entry Point (`__main__.py`)
The CLI uses a command-line parser (likely Click or argparse) to route commands to handler functions. Each handler:
1. Opens a database session
2. Validates inputs
3. Performs the operation
4. Returns JSON output
5. Commits or rolls back the transaction
**Design choice**: Each CLI call is a self-contained transaction. This prevents partial state updates and ensures the simulation remains consistent.