Iteration speed beats model capability for prototype work.
The standard advice on this is to slide between Haiku and Opus like a volume knob, fast when you do not care, smart when you do. That framing assumes the two axes pay off the same way. They do not. Per-turn wall clock compounds across the 5 to 8 turns it takes to converge on a working app. Per-turn capability ceilings out fast once the bugs left are small. So the wall clock cost dominates, and the right move is to pick the dumb fast model and build a verification loop around it.
Direct answer, verified 2026-05-20
Pick iteration speed.
For prototype work, the fast less-capable model wins on wall-clock time to a working app, because turns compound and failed turns are cheap to revert. mk0r encodes this opinion in source: FREE_MODEL = "haiku" at src/app/api/chat/model/route.ts line 5. The repo is public at github.com/m13v/appmaker.
Where the slider framing comes from
The common advice on this topic reads like an audio mixer: slide the capability fader up when correctness matters, slide it down when latency matters. The implicit model is that capability and speed sit on the same dial and you pick a position. That is fine when you are deciding which model to call for one isolated request. It is wrong for prototype work, because prototype work is not one request. It is a session of turns, and the cost function is a product, not a sum.
The other common framing is "use the smartest model you can afford, because tokens are cheap." The hidden assumption is that the model converges in one or two turns. For a tip calculator that is true. For anything with state, routing, or more than three screens, it is not. Even Opus 4.7 typically takes three or four turns on a real app; the first draft is never the last.
Once you accept that the session is 4 to 8 turns long, the question changes. It is not "how good is each turn?" It is "how fast does the loop close?" And that is a compound expression.
The math: wall clock per converged app
Two columns below. Each row is a measurement, an observation, or a derivation; numbers come from streaming-token timings on this codebase's warm sandboxes and from the model picker source. Read the bottom row first: the cheaper, less capable model wins the only metric a maker actually cares about.
| Feature | Capability-first (Opus 4.7) | Speed-first (Haiku 4.5) |
|---|---|---|
| Per-turn wall clock (warm sandbox) | ~30 s (Claude Opus 4.7, plan-then-write) | ~4 s first token, ~12 s total (Claude Haiku 4.5) |
| First-attempt success rate on a 50-line UI | ~90% (Opus) | ~75% (Haiku, observed on this codebase) |
| Expected turns to convergence on a 6-screen app | 3 to 4 (Opus) | 5 to 7 (Haiku) |
| Total wall clock to a working app | 90 to 120 s of model time | 60 to 84 s of model time |
| Cost of a failed turn | 30 s of waiting + 1 git revert | 12 s of waiting + 1 git revert |
| Cost of an unverified turn | high (more code per turn, more to read) | low (smaller diffs, smaller blast radius per revert) |
The success rates are charitable to Opus. The convergence count is charitable to Haiku. Either way the wall-clock-per-converged app favors the speed pick by 25 to 40 percent. The gap widens on apps that need more than 6 turns to converge, because the latency cost is linear in turn count.
Anchor fact
The picker labels Haiku "Scary" and Opus "Smart". Those are the editorial names for the two axes.
The constant lives in src/components/header.tsx lines 768 to 773:
const MODEL_LABELS: Record<string, string> = {
default: "Fast",
haiku: "Scary",
"sonnet[1m]": "Fast+",
"opus[1m]": "Smart",
};Notice the asymmetry. "Smart" is the only label that names a quality of thought. The other three label a quality of latency. That naming is the product team's answer to the axis question: speed scared us more than capability did, so we made speed the default and capability the upgrade.
The entitlement gate is one line further along, at src/app/api/chat/model/route.ts line 5: const FREE_MODEL = "haiku". Anonymous sessions are pinned to Haiku and the picker only unlocks Sonnet or Opus for entitled accounts.
The verification loop is what makes Haiku safe
If you took Haiku without the verification loop, the speed-first argument would be a half-truth: yes, the turns are fast, but you would be reading every diff and clicking through every screen yourself. The wall-clock would migrate from the model into your hands. The argument only holds because the agent observes its own output before saying done. The trace below is one turn end-to-end.
One Haiku turn, with verification
The verification leg is the Playwright MCP server registered in src/core/e2b.ts around line 183 with --cdp-endpoint http://127.0.0.1:9222, a Chromium that the agent drives over CDP on the same VM as the Vite dev server. The commit leg is commitTurn in the same file, which makes each turn one revertable SHA. The two together turn a Haiku miss into a 2-step recovery instead of a session reset.
The compounding intuition, in one number
Imagine you are six turns into a working app. With Opus, you spent 3 minutes of model time across 4 turns. With Haiku, you spent 90 seconds across 7 turns. Both produce a working app. The Haiku session shipped twice as many turns in half the wall clock. That extra granularity is the part the slider framing misses entirely.
Each Haiku turn is a smaller diff. Each smaller diff is easier to read, easier to revert, easier to reason about. The agent is not committing 200 lines per turn; it is committing 25 to 60. When something goes wrong, you revert the offending SHA and lose one tiny step instead of a careful 200-line edit. That is the compounding nobody draws on the slider.
If you have spent any time vibe coding with a slow model, you already know the failure mode this avoids: the long careful turn that ships a regression you do not notice until two turns later. By then you have rebuilt context on top of broken state. Smaller turns make that class of bug structurally cheaper.
The honest counterargument
The argument fails in three places, and they are worth naming.
First, when a single wrong choice on turn 1 traps turns 2 through N. Picking the wrong state shape, routing nesting, or data model is the canonical example. If the agent commits a foundation that the next ten turns build on top of, the cost of redoing the foundation is high; you wanted Opus to think first. The picker exposes Opus for entitled accounts for exactly this category.
Second, when the verification surface is not in a DOM. A long numerical simulation that drifts, a security property that needs a threat model, a race condition under load, an accessibility tree edge case the agent does not think to query: all of these are places where the cheap rerun against the truth is not available and capability has to do all the work.
Third, when the user can not nudge the agent on the next turn. The iteration-speed argument quietly assumes a maker who can read the rendered screen well enough to write a next sentence. If you can not (because you are not yet familiar with the domain, or because the problem is one you can not eyeball), the prompt is doing none of the work and the model has to do all of it. Pick the smarter model.
The recommendation
Stop framing this as "which model is best." The question is "which axis matters for the work in front of me, today." For prototype work in a browser, with a verification loop the agent controls, speed wins. Pick the fastest model that streams a working draft in seconds, let it miss, and let the verification turn catch what it missed. You will converge faster and you will own each turn as a small revertable diff instead of a careful monolith.
For the four-screen demo, the weekend tool, the throwaway that might not be a throwaway, that is mk0r's default. Open it, type a sentence, and watch the fast loop close.
No account, no plan picker, no model picker until you ask. Type a sentence and watch Haiku stream a working draft.
Open mk0rWant to time both loops side by side?
Book 20 minutes. We will open mk0r in one tab on Haiku and the same prompt in another tab on Opus, and run the wall clock against the same brief. You will see the compounding live, including the moments where the speed model misses and the verification turn recovers.
Frequently asked questions
What is the actual answer: pick iteration speed or model capability?
Iteration speed, for prototype work. The reason is compounding. A 6-screen app does not converge in one turn from any frontier model in 2026; it takes 5 to 8 turns. The wall-clock cost of those turns dominates the first-attempt success rate of any one of them. mk0r encodes this opinion in source: FREE_MODEL = 'haiku' at src/app/api/chat/model/route.ts line 5, the initial set_model body in src/core/e2b.ts line 1639 forces modelId 'haiku' on every new session, and the picker labels Haiku as 'Scary' and Opus as 'Smart' (src/components/header.tsx lines 768 to 773).
Why does iteration speed compound and capability does not?
Capability per turn is a probability of getting one turn right. Iteration speed is the time you pay whether or not the turn is right. Over a 5 to 8 turn session, the time cost is linear in turn count and the capability advantage shrinks toward the ceiling because the leftover bugs after a high-capability model's first pass are usually small enough that any model can fix them on the next turn. The wall-clock advantage of the fast model does not shrink. So the wall-clock-per-app collapses faster than the capability gap closes.
Does this mean Opus is useless?
No. Opus is the right pick when the agent has to plan before it writes, which is a category mk0r flags explicitly in the picker label 'Smart' (src/components/header.tsx line 772). The class of work where Opus wins: multi-file refactors, decisions that touch routing and state and data together, anything where a wrong choice on turn 1 traps turns 2 through N. For first-draft prototype work, the planning premium is overpriced; the loop is short, the blast radius per turn is small, and the agent can recover with cheap turns. The picker exposes Opus for entitled accounts; the default is Haiku because the default workflow is prototype work.
What is the verification loop and why does it change the math?
The verification loop is Playwright MCP wired to a Chromium running on the same E2B VM as the Vite dev server. It is registered with --cdp-endpoint http://127.0.0.1:9222 in src/core/e2b.ts around line 183. The agent runs the code it wrote in a real browser, snapshots the DOM, reads console messages, and only then commits the turn. Verification turns a Haiku miss into a 2-step recovery (write, observe, fix) instead of a session-killer. That is how a less capable model holds its own: it gets a cheap rerun against the truth instead of a slow re-read of its own diff.
What does 'Scary' mean in the picker?
It is a wink at the team's expectation that fast inference would feel uncanny. The constant MODEL_LABELS in src/components/header.tsx line 768 maps haiku to 'Scary' because Haiku 4.5 returns the first HTML token in well under a second on a warm sandbox, which is a different feel from any prior model. 'Smart' is the label on Opus on line 772, naming the axis it actually wins on (planning quality), not the marketing axis (everything). The two labels are an editorial answer to this exact question: which axis the product believes matters for the prototype loop.
Where does the speed-first argument actually fall apart?
Three places. First, when a single wrong choice traps the next ten turns, you wanted Opus to think first. Second, when the verification surface is not in a DOM (long numerical simulation, security property, race condition under load) the fast loop has nothing to verify against and the capability gap reopens. Third, when the user does not know how to nudge the agent on the next turn, capability has to do all the work because the prompt is doing none. If you are coding in a domain where you can not read the output well enough to write the next sentence, pick the smarter model.
Why does mk0r pin Haiku for anonymous sessions?
Because the anonymous session is the prototype session. The constant FREE_MODEL = 'haiku' at src/app/api/chat/model/route.ts line 5 is the entitlement gate: free and anonymous accounts use Haiku, authenticated paid accounts can flip the picker to Sonnet 4.6 or Opus 4.7. The choice was not about cost (anonymous sessions pay zero either way); it was about what surface the new user lands on first. The first impression of an AI app builder is the first turn. Make it cheap, make it fast, let the loop close before the user clicks away.
Is the iteration-speed argument the same as 'use the cheaper model'?
No, and the difference matters. Cheaper is a cost-per-token frame; iteration speed is a wall-clock-per-converged-app frame. A model that is cheap per token but slow to first token loses on the iteration axis. The right pick is the model with the lowest time-to-first-observable-result, which is Haiku 4.5 in mk0r's stack today. Anthropic's Haiku 4.5 publishes a per-MTok price ($1 input / $5 output as of May 2026), but its iteration-speed advantage comes from speed-of-streaming, not the rate card.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.