mirror of
https://github.com/collinear-ai/yc-bench.git
synced 2026-04-19 12:58:03 +00:00
update prompt
This commit is contained in:
parent
3827464380
commit
d829b07e60
3 changed files with 3 additions and 10 deletions
1
.gitignore
vendored
1
.gitignore
vendored
|
|
@ -37,3 +37,4 @@ agent.md
|
||||||
|
|
||||||
# Claude session context — local only
|
# Claude session context — local only
|
||||||
CLAUDE.md
|
CLAUDE.md
|
||||||
|
old/
|
||||||
|
|
|
||||||
|
|
@ -45,14 +45,6 @@ echo "Seeds: $SEEDS"
|
||||||
echo "Models: ${#ALL_MODELS[@]}"
|
echo "Models: ${#ALL_MODELS[@]}"
|
||||||
echo ""
|
echo ""
|
||||||
|
|
||||||
# Run greedy bot baseline first
|
|
||||||
echo "--- Running greedy bot baseline ---"
|
|
||||||
for seed in $SEEDS; do
|
|
||||||
echo " greedy_bot | $CONFIG seed=$seed"
|
|
||||||
uv run python scripts/bot_runner.py --bot greedy --config "$CONFIG" --seed "$seed"
|
|
||||||
done
|
|
||||||
echo ""
|
|
||||||
|
|
||||||
# Run all LLM models
|
# Run all LLM models
|
||||||
for model in "${ALL_MODELS[@]}"; do
|
for model in "${ALL_MODELS[@]}"; do
|
||||||
for seed in $SEEDS; do
|
for seed in $SEEDS; do
|
||||||
|
|
|
||||||
|
|
@ -18,7 +18,7 @@ All actions use `yc-bench` CLI commands via `run_command`. All return JSON.
|
||||||
|
|
||||||
Run multiple tasks concurrently when possible. Accept → assign → dispatch a second task before calling sim resume.
|
Run multiple tasks concurrently when possible. Accept → assign → dispatch a second task before calling sim resume.
|
||||||
|
|
||||||
**Use `yc-bench scratchpad write`** to save strategy notes — your conversation history is truncated after 20 turns, but scratchpad persists in the system prompt. Write rules, not events (e.g. "assign Emp_1,Emp_4,Emp_7 for inference tasks" not "Task-42 failed").
|
**Use `yc-bench scratchpad write`** to save strategy notes — your conversation history is truncated after 20 turns, but scratchpad persists in the system prompt. Write reusable rules, not one-off observations.
|
||||||
|
|
||||||
## Commands
|
## Commands
|
||||||
|
|
||||||
|
|
@ -43,7 +43,7 @@ Run multiple tasks concurrently when possible. Accept → assign → dispatch a
|
||||||
|
|
||||||
## Key Mechanics
|
## Key Mechanics
|
||||||
|
|
||||||
- **Salary bumps**: completed tasks raise salary for every assigned employee. Assigning all 8 to every task compounds payroll until it exceeds revenue — assign 3-4 domain specialists instead.
|
- **Salary bumps**: completed tasks raise salary for every assigned employee. More employees assigned = higher payroll growth.
|
||||||
- **Throughput split**: employees on multiple active tasks split their rate (rate/sqrt(N)). Two tasks run at ~71% each.
|
- **Throughput split**: employees on multiple active tasks split their rate (rate/sqrt(N)). Two tasks run at ~71% each.
|
||||||
- **Deadlines**: success before deadline = reward + prestige. Failure = prestige penalty, no reward.
|
- **Deadlines**: success before deadline = reward + prestige. Failure = prestige penalty, no reward.
|
||||||
- **Trust**: completing tasks for a client builds trust → less work per task, access to gated tasks. Working for one client erodes trust with others.
|
- **Trust**: completing tasks for a client builds trust → less work per task, access to gated tasks. Working for one client erodes trust with others.
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue