The limits of agent demos, written in the source of a live agent product
An agent demo is edited. A live agent product is not. The gap between the two is not vibes or hand-waving, it is five specific gates that have to exist in production code: a watchdog for the wedge case, a ceiling for the long-turn case, a classified-error UI for the four failure modes the SDK actually throws, a sandbox TTL, and an anonymous-turn cap. Here are mk0r's five, with the file and line that fires each one.
At least five things hide in a typical agent demo that the live product cannot conceal: a 15 second time-to-first-token timeout that fires when the agent does not start streaming, a hard ceiling on a single turn, a classified-error UI for credit and auth and image and stale-session failures, a 1 hour sandbox lifetime, and an anonymous-turn cap. Each one is a gate somewhere in mk0r's source. The demo author can edit each one out of a clip. The live product has to render it.
Verified against the appmaker source at src/app/api/chat/route.ts (lines 23, 25, 48, 539 to 551, 575 to 657) and src/core/e2b.ts:33.
“If the agent does not emit any notification within 15 seconds, the session is evicted and the literal string sent to the browser is "Agent did not respond within 15s. Please retry." No demo ever shows that string. Every live user eventually does.”
TTFT watchdog at src/app/api/chat/route.ts:539
The frame: a demo is allowed to cut, a live product is not
When somebody posts a clip of an agent doing something, what you are watching is a sequence of frames that survived editing. The takes that did not survive are the ones where the agent hung, errored, hit a quota, or sat there for 40 seconds before the first token. The clip is the agent's good half hour. That is fine, it is what marketing is.
The honest question is: what does the live product do during the cut moments? If the answer is "the same thing the demo showed, just slower," great. If the answer is "the connection hangs forever and the user closes the tab," the demo was selling a product that does not exist yet.
The gates below are the cut moments, made visible. Each one is a place in mk0r where the live product had to write code that no demo needed.
Gate one. The 15 second time-to-first-token watchdog.
When a real user submits a prompt and waits, the most common failure is not a wrong answer, it is no answer at all. The ACP subprocess inside the sandbox accepts the request, then returns nothing. From the browser's perspective the stream is alive but idle. The user is looking at a blinking cursor.
At src/app/api/chat/route.ts:539, a setTimeout fires 15 seconds after the request begins. If no notification has come back by then, the code evicts the session, fires the chat_ttft_timeout telemetry event, and sends the user this:
The cost: a user who would have gotten a response on the 18th second sees the error on the 15th. The win: a user staring at a wedged box is told, and the next request boots a fresh VM. A demo cuts this entire window. The live product has to pick a number, and 15 is what the code currently says.
Gate two. The 800 second per-turn ceiling.
At line 25 of the same file, export const maxDuration = 800 tells the platform that this route is allowed at most 13 minutes 20 seconds of wall time per request. After that, the underlying runtime kills the response. The agent does not get to finish.
13 minutes is long for a chat turn and short for a fully autonomous run. The shape of the ceiling tells you the shape of the product: this is a synchronous chat turn, not a fire-and- forget worker. Anything that needs to run longer (a scheduled build, a multi-step refactor, a Playwright test suite) belongs on the scheduler MCP that ships in the VM at /opt/scheduler- mcp.js, where the timeout structure is different.
Agent demos love to imply unbounded runs. The trick is they usually run on a different transport (no HTTP request waiting), or they run for two minutes and edit the rest. Live products that hold a synchronous connection open need a number.
Gate three. The classified-error UI.
The expensive part of a live agent product is not the happy path, it is the error path. When the SDK throws, the live product has to figure out which kind of failure happened so the next action makes sense. mk0r's answer is a discriminated union at line 48:
type ErrorKind = | "credit_exhausted" | "auth_required" | "invalid_request" | "image_error" | "stale_session" | "generic";
Each kind is a separate branch in the stream handler (route.ts:585 to 657). credit_exhausted forwards the SDK's reset-time message so the user can see when service resumes. auth_required tells the UI to re-authenticate, not retry the same prompt. image_error surfaces inline instead of being swallowed by a generic toast. stale_session is the only kind where retry helps, and it retries exactly once, only when no text has streamed yet (route.ts:621).
A demo is allowed to throw all five of those into a single "something went wrong" toast, because the demo author will rerun the take. A live product cannot, because the user has nowhere to rerun from. The discriminated union is the cost of being honest about which failure happened.
Gate four. The 1 hour sandbox TTL.
The agent does not live in your browser, it lives in an E2B sandbox. That sandbox is configured to pause itself one hour after the last activity:
// src/core/e2b.ts:33 const E2B_TIMEOUT_MS = 3_600_000; // 1 hour
A demo finishes in two minutes, so this never matters. A real user who walks away for lunch comes back to a paused VM. There is a reload path that brings the git history back (the timeline is not destroyed), but the live HMR session is. The reload UI, the resume code, and the "paused" visual state all had to be written. None of that exists in any demo clip.
The honest version of "your agent has its own VM" is "your agent has its own VM until you stop using it, at which point we pause it to keep the unit economics sane." You can debate the number. You cannot avoid having one.
Gate five. The 6 turn anonymous cap.
No signup means no identity to throttle. So the only abuse gate left on the anonymous loop is a per-turn counter:
// src/app/api/chat/route.ts:23 const ANON_TURN_LIMIT = 6;
On the 7th anonymous prompt, the route returns 403 sign_in_required. A demo never hits this because the demo author is logged in. The live anonymous user does, and a sign- in modal pops up where the next response would have streamed. That modal does not exist in any demo of an "account- less" agent product. It exists in the live one because the live one had to pick between "no signup ever" and "the cluster gets eaten by a single tab." It picked the cap.
The same flow, demo-side and live-side
Put the five gates on one axis and the demo path on the other, and the gap is mechanical, not philosophical:
| Feature | A typical agent demo | Live mk0r |
|---|---|---|
| Agent does not start streaming in time | Demo author retakes the clip. Viewer sees the second attempt. | TTFT watchdog evicts the session at 15s and renders "Agent did not respond within 15s. Please retry." (route.ts:539). |
| Single turn runs longer than expected | Demo cuts to the final state. No timer visible. | maxDuration = 800 (route.ts:25). Cloud Run kills the response after 13 minutes 20 seconds, period. |
| Account out of credit mid-turn | Demo never runs at the edge of the quota. | classifyPromptError flags credit_exhausted on HTTP 402, 429, or any "credit balance is too low" regex hit, and surfaces the SDK's reset-time message to the user (route.ts:48 to 68). |
| Auth token rotated or revoked mid-session | Demo uses one fixed credential. Never expires on camera. | auth_required is a separate error kind. The chat stream emits a typed event the UI re-prompts on (route.ts:601 to 605). |
| Image too large to send to the model | Demo uses pre-sized assets. | image_error is its own kind. The UI surfaces "image too large / unable to resize" inline instead of swallowing it (route.ts:75 to 77). |
| ACP subprocess loses track of the session | Demo restarts the clip. | stale_session triggers restartAndReloadSession and one retry, but only when no text has streamed yet (route.ts:621 to 653). After text has flowed, surface the real error. |
| User walks away for two hours | Demo runs end to end in one take. | E2B_TIMEOUT_MS = 3_600_000 in src/core/e2b.ts:33. Sandbox pauses after one hour of inactivity. |
| Anonymous visitor keeps prompting | Demo author is logged in. | ANON_TURN_LIMIT = 6 (route.ts:23). Sixth prompt hits a sign_in_required gate. |
The right column is not bragging. It is the bill the left column never paid. Every row is a place where the live product had to write either a watchdog, a classifier, a recovery path, or a cap. The demo author chose which rows to render and which to cut. The live product renders all of them, every time.
What the failure path actually looks like, top to bottom
Reading the gates as a sequence is more useful than reading them as a list, because in production they fire in a partial order. The TTFT watchdog can win the race against prompt_error. The stale_session retry is allowed to interrupt the stream, but only once. The TTL only fires when the user has gone idle. Here is one possible run, with the gate that fires at each step:
A live agent turn that hits two gates
The demo of the same turn would be three frames: prompt, loading spinner, finished preview. Every event labelled "gate" would be either invisible or cut entirely.
How to read a demo from someone else
The point is not to dunk on agent demos. The point is to give you a checklist. The next time someone shows you an agent doing something impressive, walk down the five gates and ask where they live in the product:
- What does the product do when the first token does not arrive in time? Is there a watchdog, and what is its value? If the demo cut to mid-response, the TTFT could be any number.
- What is the ceiling on a single agent turn? If the answer is "unbounded," the agent can hang. If it is "way shorter than the demo," the demo is on a different transport.
- What does the error UI look like when the SDK throws? A product without a discriminated error type is going to show you the same toast for billing, auth, and image failures, and you will retry the wrong thing every time.
- How long does the underlying compute live after the turn ends? A demo that runs for two minutes hides anything past that.
- Is anything gated on an account? An anonymous demo and a logged-in demo are two different products. The first is the on-ramp, the second is the workshop.
If you can answer all five from a public artifact (a video, a tweet, a sandbox), the team has thought about it. If you cannot, you do not actually know what you are looking at. That is fine, you can ask. The honest teams will tell you their numbers.
The summary
The limits of agent demos are not about the agent. They are about the gap between a take you can retry and a turn the user is watching live. mk0r's five gates (TTFT watchdog, maxDuration ceiling, classified errors, sandbox TTL, anon cap) are the cost of closing that gap. The values are debatable. The fact that they exist as named constants in the source is not.
Open mk0r.com on a phone. Send a prompt. If you have a slow network, you may see the TTFT error. If you stay anonymous and keep going, you will hit the turn cap. If you walk away for an hour and come back, you will see the paused sandbox. None of those would survive editing a demo. All of them are the live product being honest about what it can and cannot do.
Want to read the five gates on your own use case?
Bring an agent demo you do not trust. We will open the source for mk0r side by side, walk the five gates, and figure out which ones the demo cut.
Frequently asked questions
What is the single biggest thing an agent demo hides?
Time. A demo is recorded, so the author can wait as long as they want for the first token, cut the dead air, and ship a clip that looks instant. A live agent product cannot. mk0r ships a 15 second time-to-first-token watchdog at src/app/api/chat/route.ts:539. If the agent does not emit any notification within that window, the session is evicted and the literal string "Agent did not respond within 15s. Please retry." is sent to the browser. That string does not exist in any demo. It exists because in production, sometimes the box is wedged and the user is staring at a blinking cursor, and refusing to time out would be worse than admitting it.
Doesn't the 800 second per-turn ceiling defeat the whole "agent" pitch?
Sort of, and it is on purpose. maxDuration = 800 at src/app/api/chat/route.ts:25 caps a single chat turn at 13 minutes 20 seconds. The reasoning is the same as the TTFT watchdog: a real user is sitting in front of a tab, and a stuck agent that holds the connection open forever degrades trust faster than one that hits a known ceiling and surfaces "this turn ran too long." Long agent runs belong on a scheduled cron, not on a synchronous chat turn. The product has a scheduler MCP for exactly that case (built into the VM as /opt/scheduler-mcp.js). When you watch a demo of an "autonomous" agent that runs for an hour, what is happening offscreen is either a much longer ceiling, a different transport (no HTTP request waiting), or a cut.
Why classify errors at all? Couldn't you just show a generic "something went wrong"?
Because the next action the user should take is different in each case. credit_exhausted means top up the account, and showing the SDK's reset-time message lets the user see when service resumes. auth_required means re-authenticate, not retry the same prompt. image_error means resize the image, not refresh the page. stale_session is the only kind where retry actually helps, and the code retries it exactly once, only if no text has streamed yet (src/app/api/chat/route.ts:621). A generic error would push every user to the same useless path. The type system at line 48 (ErrorKind = "credit_exhausted" | "auth_required" | "invalid_request" | "image_error" | "stale_session" | "generic") is the live product paying the price of being honest about which kind of failure happened. Demos pay zero of this price because demos never fail.
What does the 1 hour sandbox TTL look like to a real user?
It looks like coming back from lunch and finding the dev server is no longer responding. E2B_TIMEOUT_MS = 3_600_000 (src/core/e2b.ts:33) tells the E2B SDK to pause the underlying VM one hour after the last activity. There is a reload path that brings the git history back, so the timeline is not destroyed, but the live HMR session is. A demo glides past this because a demo finishes in two minutes. A live product has to write the reload UI, the resume code, and the user-visible "paused" state. None of that exists in a demo clip.
Why a 6 turn anonymous cap? Doesn't that contradict the "no signup" pitch?
It does, partially. ANON_TURN_LIMIT = 6 at src/app/api/chat/route.ts:23 exists because without an account, the per-turn counter is the only abuse gate on the cluster. The honest version of "no signup" is "no signup for 6 turns." 6 is the smallest number where a curious visitor can close a real iteration loop (seed, refine twice, undo once, retry). A demo never hits this because the demo author is logged in or runs on an unmetered dev environment. The cap is one of the gates a live product has to render that a demo can cut entirely.
Are there agent demos that are basically honest?
A few. The honest ones tell you up front what they cut and what they did not. The dishonest ones imply a single take when there were many, imply zero waits when there were several, and imply no retries when the agent crashed and was relaunched. The way to tell is to ask whether the demo author would show you the same flow live on a screenshare. If yes, the demo is probably faithful. If no, it is a polished cut. The five gates above are real constants in mk0r's source, each at a named file and line, so you can ask to see any of them rather than taking the demo's word for it.
What should I look for the next time someone shows me an agent demo?
Five things. One, did the agent ever start streaming in the demo, or did the recording cut to mid-response? If it cut, the TTFT could have been any number. Two, what is the ceiling on a single turn? If the demo never says, assume it is either unbounded (so the agent can hang indefinitely) or much shorter than the demo implies. Three, what happens when an error fires? A demo that never shows an error window has not stress-tested the error UI. Four, how long does the underlying compute live? A short demo can hide a 5 minute pause. Five, is anything gated on an account? A logged-in demo is allowed to do things an anonymous user cannot. If you can answer all five from the public artifact, the team has thought about it. If you cannot, they probably have not.
More on the gap between agent demos and live agent products
Keep reading
Vibe Coding Friction Tradeoffs: Four Things You Lose When You Lose the Login
The companion piece. Every removed friction (signup, picker, dashboard, save button) shows up as a constant in the source. Same anchor source files, different lens.
Multi-Agent Coordination Drift
What goes wrong when more than one agent shares state. Demos almost never show coordination failure modes because they almost never run more than one agent.
AI App One-Shot Prototype Limits
Where the single-prompt demo flatters the model. What you can and cannot honestly ship from one sentence, with the file paths that define the ceiling.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.