mirror of
https://github.com/collinear-ai/yc-bench.git
synced 2026-04-19 12:58:03 +00:00
update index html
This commit is contained in:
parent
6eba7a9854
commit
556a35363d
1 changed files with 3 additions and 3 deletions
|
|
@ -318,10 +318,10 @@
|
|||
</div>
|
||||
|
||||
<!-- Pipeline -->
|
||||
<div style="max-width: 820px; margin: 0 auto;">
|
||||
<div style="max-width: 960px; margin: 0 auto 56px;">
|
||||
<h3 style="font-size: 1.15rem; font-weight: 700; margin-bottom: 8px;">Long-horizon coherence is a pipeline, and models fail at different stages</h3>
|
||||
<p style="color: #475569; line-height: 1.7; font-size: 0.95rem;">
|
||||
Flash fails from the absence of reflection. Grok fails despite accurate reflection, unable to close the loop between diagnosis and action. Sonnet fails from temporally inconsistent reflection –rules written and immediately abandoned. Only Opus achieves sustained, self-correcting reflection. This suggests long-horizon coherence is not a single capability but a pipeline: <em>perceive → record → retrieve → act consistently</em>, and current models fail at different stages.
|
||||
<p style="color: #475569; line-height: 1.7; font-size: 0.95rem; margin-bottom: 24px;">
|
||||
Flash fails from the absence of reflection. Grok fails despite accurate reflection, unable to close the loop between diagnosis and action. Sonnet fails from temporally inconsistent reflection –rules written and immediately abandoned. Only Opus achieves sustained, self-correcting reflection.
|
||||
</p>
|
||||
</div>
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue