..
yc_bench_result_1_openrouter_google_gemini-2.5-flash-preview.json
Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results
2026-02-26 00:31:00 -08:00
yc_bench_result_1_openrouter_google_gemini-3-flash-preview.json
Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results
2026-02-26 00:31:00 -08:00
yc_bench_result_1_openrouter_google_gemini-flash-1.5.json
Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results
2026-02-26 00:31:00 -08:00
yc_bench_result_1_openrouter_liquid_lfm-2.5-1.2b-thinking:free.json
Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results
2026-02-26 00:31:00 -08:00
yc_bench_result_1_openrouter_minimax_minimax-m2.5.json
Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results
2026-02-26 00:31:00 -08:00
yc_bench_result_1_openrouter_moonshotai_kimi-k2.5.json
Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results
2026-02-26 00:31:00 -08:00
yc_bench_result_1_openrouter_nvidia_nemotron-3-nano-30b-a3b:free.json
Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results
2026-02-26 00:31:00 -08:00
yc_bench_result_1_openrouter_openai_gpt-4o-mini.json
Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results
2026-02-26 00:31:00 -08:00
yc_bench_result_1_openrouter_openai_gpt-5.2-pro.json
Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results
2026-02-26 00:31:00 -08:00
yc_bench_result_1_openrouter_x-ai_grok-4.1-fast.json
Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results
2026-02-26 00:31:00 -08:00
yc_bench_result_1_openrouter_z-ai_glm-5.json
Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results
2026-02-26 00:31:00 -08:00
yc_bench_result_fast_test_1_openai_gpt-4.1-mini.json
Calibrated task difficulty based on deadlines
2026-03-06 11:18:22 -08:00
yc_bench_result_hard_1_anthropic_claude-sonnet-4-6.json
Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results
2026-02-26 00:31:00 -08:00
yc_bench_result_hard_1_gemini_gemini-3-flash-preview.json
Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results
2026-02-26 00:31:00 -08:00
yc_bench_result_hard_1_openai_gpt-5.2.json
Added the configs and updated the results.
2026-02-26 13:37:58 -08:00
yc_bench_result_hard_1_openai_gpt-5.4.json
Updated initial eval on new backend
2026-03-06 18:49:32 -08:00
yc_bench_result_hard_1_openrouter_anthropic_claude-haiku-4-5.json
Add multi-episode setting with scratchpad carryover between bankruptcies
2026-03-11 19:22:32 -07:00
yc_bench_result_hard_2_anthropic_claude-sonnet-4-6.json
Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results
2026-02-26 00:31:00 -08:00
yc_bench_result_hard_2_gemini_gemini-3-flash-preview.json
Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results
2026-02-26 00:31:00 -08:00
yc_bench_result_hard_2_openai_gpt-5.2.json
Added the configs and updated the results.
2026-02-26 13:37:58 -08:00
yc_bench_result_hard_3_anthropic_claude-sonnet-4-6.json
Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results
2026-02-26 00:31:00 -08:00
yc_bench_result_hard_3_gemini_gemini-3-flash-preview.json
Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results
2026-02-26 00:31:00 -08:00
yc_bench_result_medium_1_gemini_gemini-3-flash-preview.json
Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results
2026-02-26 00:31:00 -08:00
yc_bench_result_medium_1_openai_gpt-5.2.json
Added the configs and updated the results.
2026-02-26 13:37:58 -08:00
yc_bench_result_medium_1_openai_gpt-5.4.json
Updated initial eval on new backend
2026-03-06 18:49:32 -08:00
yc_bench_result_medium_2_gemini_gemini-3-flash-preview.json
Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results
2026-02-26 00:31:00 -08:00
yc_bench_result_medium_2_openai_gpt-5.2.json
Added the configs and updated the results.
2026-02-26 13:37:58 -08:00
yc_bench_result_medium_3_anthropic_claude-sonnet-4-6.json
Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results
2026-02-26 00:31:00 -08:00
yc_bench_result_medium_3_gemini_gemini-3-flash-preview.json
Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results
2026-02-26 00:31:00 -08:00
yc_bench_result_medium_3_openai_gpt-5.2.json
Added the configs and updated the results.
2026-02-26 13:37:58 -08:00
yc_bench_result_nightmare_1_anthropic_claude-sonnet-4-6.json
Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results
2026-02-26 00:31:00 -08:00
yc_bench_result_nightmare_1_gemini_gemini-3-flash-preview.json
Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results
2026-02-26 00:31:00 -08:00
yc_bench_result_nightmare_1_openai_gpt-5.2.json
Added the configs and updated the results.
2026-02-26 13:37:58 -08:00
yc_bench_result_nightmare_2_gemini_gemini-3-flash-preview.json
Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results
2026-02-26 00:31:00 -08:00
yc_bench_result_nightmare_2_openai_gpt-5.2.json
Added the configs and updated the results.
2026-02-26 13:37:58 -08:00
yc_bench_result_nightmare_3_anthropic_claude-sonnet-4-6.json
Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results
2026-02-26 00:31:00 -08:00
yc_bench_result_nightmare_3_gemini_gemini-3-flash-preview.json
Fix horizon bug, multi-provider support, add Sonnet vs Gemini benchmark results
2026-02-26 00:31:00 -08:00
yc_bench_result_nightmare_3_openai_gpt-5.2.json
Added the configs and updated the results.
2026-02-26 13:37:58 -08:00