Guide

The 287 characters that save you a penny a turn

Direct answer (verified 2026-05-11): a predefined stack lowers AI app builder inference cost by collapsing the system prompt. mk0r's system prompt is 287 characters because Vite, React, TypeScript, and Tailwind are already installed in the sandbox image. v0's leaked system prompt is 46,186 bytes because it teaches the model the stack on every turn. On Claude Haiku 4.5 at $1 per million input tokens, that is the difference between $0.000072 and roughly $0.012 per turn on the system param alone, before the model writes a single output token. Verified against platform.claude.com/docs/en/about-claude/pricing.

Matthew Diakonov, Written with AI

Published May 11, 20268 min

Show me the prompt

This is the whole system prompt the model gets on every turn in mk0r's VM mode. It lives at src/core/e2b.ts:170 and is exported as DEFAULT_APP_BUILDER_SYSTEM_PROMPT. The repo is open source on GitHub, so you can verify this yourself.

You are an expert app builder inside an E2B sandbox with a Vite + React + TypeScript + Tailwind CSS v4 project at /app. The dev server is running on port 5173 with HMR. You have Playwright MCP for browser testing. Read your CLAUDE.md files for detailed workflow, coding standards, and memory instructions.

287 characters. Roughly 70 tokens by Anthropic's tokenizer. That is the cost of teaching the model what stack it is working with on every single turn.

287 vs 46,186 bytes

“The cheapest input token is the one the model never sees.”

mk0r system prompt vs. v0's public mirror

Why the prompt can be that small

The 287 characters work because the stack does not have to be explained, it has to be present. The E2B sandbox image is built from docker/e2b/e2b.Dockerfile and the relevant block is at lines 77 to 87:

npm create vite@latest . -- --template react-ts runs at image build time, so package.json, tsconfig.json, src/main.tsx, and index.html all already exist.
npm install runs in the same build step, so the node_modules tree is in the image. No model output goes to a dependency list.
Tailwind v4 is installed as a dev dep and wired into vite.config.ts via the @tailwindcss/vite plugin, copied in from docker/e2b/files/app/vite.config.ts.
A small HMR bridge, src/_mk0rBridge.ts, is layered in so the preview iframe can stream HMR updates to the parent window. The model is told not to touch it.

When a session is claimed, the dev server is already running on port 5173 inside the sandbox. The model's job is to edit src/App.tsx and add files under src/components/. Everything else is the image's problem.

What flows where

The system prompt and the CLAUDE.md memory file fan out across things that already exist in the sandbox. The model does not have to invent any of them.

Stack lives in the image, not in tokens

The cost gap, turn by turn

On Claude Haiku 4.5 in May 2026, Anthropic bills $1 per million input tokens, $5 per million output tokens, and $0.10 per million cached input read. Source: platform.claude.com/docs/en/about-claude/pricing. Multiply those rates by the system-prompt sizes and you get a clean per-turn floor on inference cost before the user even prompts.

System prompt re-billed every turn

The system prompt has to teach the model the framework. v0's leaked prompt is 46,186 bytes. The model is told what Next.js does, how the App Router works, when to use server components, how to structure a components folder, where to put styles, how to call deploy. On every turn, the cache prefix gets re-read at $0.10 per million tokens, or, on the first turn of a new session, fresh-billed at $1 per million. Then the model emits its own package.json, vite.config.ts, tsconfig.json, and index.html, which bill at $5 per million output tokens.

~11,500 token system prompt re-read every turn
Boilerplate files re-emitted at $5/MTok output
Cache-miss penalty on every fresh session
Model invents framework details, sometimes wrong

The first-turn delta on a fresh session is the loudest. v0's ~11,500 token prompt at $1 per million is $0.0115 per first turn. mk0r's ~70 tokens is $0.00007. That is a 165x ratio. After the cache warms, the gap drops to about 16x because both sides bill at the $0.10 per million cached-read rate, but the cache invalidates every time you switch models or the prefix changes. mk0r's prompt does not have a meaningful prefix to invalidate.

The output side, which actually matters more

Output tokens bill at $5 per million on Haiku 4.5, five times the fresh input rate. If a model has to emit package.json, vite.config.ts, tsconfig.json, postcss config, an index.html shell, and a main.tsx before it gets to your actual app, that is a few hundred tokens of boilerplate per project at $5 per million. mk0r's model never writes any of those files. The output is just the edits to App.tsx and new files under src/components/.

There is a second, less obvious savings here. When the model generates boilerplate, it sometimes gets framework details subtly wrong: a missing peer dep, a wrong Tailwind version, a Vite plugin import that does not exist anymore. Each of those mistakes costs a repair turn, and repair turns are full new turns that bill input and output tokens. A predefined stack collapses that entire category of failure because the framework details are validated by the image build, not by the model.

Where this gets honest

A predefined stack is not a free win for every product. Three places the savings story breaks down:

Stack diversity costs more than tokens. mk0r picked Vite plus React plus TypeScript plus Tailwind v4 plus Playwright. If you want to also support Next.js with server actions, or Svelte, or a Python backend, you either build separate images or you push framework knowledge back into the prompt, and the savings disappear.
The user has to want the stack you picked. A predefined stack is an opinion. If a user shows up wanting a Remix app, the cheap-prompt story does not help them. mk0r is explicit about this trade in the picker: pick Quick mode for a single-file HTML app, pick VM mode for the React stack.
Cache hits eat most of the gap. On a hot path with Anthropic prompt caching, even a 12,000 token system prompt only bills $0.0012 per cached turn. That is real but not life-changing. The real win is at first turn, on model switch, and on the rare-but-expensive cache evictions that happen when a session has been idle for a while.

The honest summary is that a predefined stack is a moderate, durable input-side savings, a meaningful output-side savings, and an outsized reliability win. The reliability win is probably the one that pays for itself the fastest, because every avoided repair turn is a ten-to-twenty-x cost win compared to shaving cents on a system prompt.

A back-of-the-napkin per-session number

Assume a typical mk0r session: 20 turns, all on Haiku 4.5, with Anthropic prompt caching. Pretend a competing builder with an ~11,500-token system prompt is identical in every other way. The difference attributable just to the predefined stack:

Turn 1, cache miss: 11,500 tokens at $1 per million is $0.0115. mk0r: 70 tokens at $1 per million is $0.00007. Delta on turn one: about $0.0114.
Turns 2 through 20, cache hits: 11,500 tokens at $0.10 per million is $0.00115 per turn, times 19 turns is $0.0218. mk0r's 70 tokens at $0.10 per million is $0.0000133 across 19 turns. Delta: about $0.0218.
Add a model switch in the middle (cache invalidates): another $0.0115 cache-write penalty. mk0r: zero.

Total system-prompt-only savings on a 20-turn session: roughly $0.04 to $0.05. At one session per minute on a heavy day, that is $60 to $70 of pure input-side savings. The output-side savings, where the model is not emitting boilerplate, are bigger and harder to pin down because they depend on what apps people build. Put together, it is enough to fund the next 10 percent of free anonymous traffic.

Want to see the 287 characters in action?

20 minute call. I open the repo, walk you through src/core/e2b.ts and the Dockerfile, and we run one of your real app ideas through the VM.

Common questions

Frequently asked questions

What exactly counts as a predefined stack in an AI app builder?

A predefined stack means the framework, language, build tool, styling system, and dev server are already installed and configured before the model writes its first token. In mk0r, the E2B sandbox image at docker/e2b/e2b.Dockerfile lines 77 to 87 runs npm create vite at template react-ts, installs Tailwind, and copies a working vite.config.ts and _mk0rBridge.ts into /app at build time. When a session is claimed, the dev server is already running on port 5173. The model edits src/App.tsx and src/components/*.tsx and that is it.

How much smaller is mk0r's system prompt compared to v0, Bolt, and Lovable?

mk0r's DEFAULT_APP_BUILDER_SYSTEM_PROMPT at src/core/e2b.ts line 170 is 287 characters, roughly 70 input tokens. The public mirror of v0's system prompt is 46,186 bytes, roughly 11,500 tokens. Bolt's open-source prompt is 21,945 bytes, Lovable's is 20,301 bytes, Replit's agent prompt is 8,110 bytes. mk0r's prompt is 0.6 percent the size of v0's. The reason is that v0 has to teach the model the Next.js conventions, the components folder layout, the styling rules, the deployment idioms, and so on. mk0r's stack lives in the Docker image so none of that needs to be in the prompt.

Doesn't prompt caching wipe out the difference?

Mostly, but not entirely, and not on the first turn. Anthropic's prompt cache reads cached input at $0.10 per million tokens against $1 per million for fresh input on Haiku 4.5. So a 11,500 token system prompt that hits the cache bills $0.00115 per turn versus $0.0000072 for a 70 token prompt that does not need a cache. Cache writes are 25 percent more expensive than fresh input, so the first turn of any new session still pays full freight on the entire prompt. With 50 new sessions a day on a busy product, those first-turn payments add up. And every cache miss, every model switch, every long pause that drops the prefix bills the whole thing again.

Where does the savings story break down?

Two places. First, output tokens. The model still has to write the actual app code, and output bills at $5 per million on Haiku 4.5, five times the fresh input rate. A predefined stack saves the model from emitting package.json, tsconfig.json, vite.config.ts, and index.html boilerplate, but the App.tsx and component code is the same either way. Second, complexity. A predefined stack only works if the apps you generate fit the stack. mk0r's stack is Vite, React, TypeScript, Tailwind v4, Playwright. If a user wants a Next.js app with server actions or a Svelte app, the cheap-prompt advantage evaporates the moment you try to support both.

Why does the model still need a CLAUDE.md if the stack is in the image?

Because behavior is not stack. The CLAUDE.md files at /root/.claude/CLAUDE.md and /app/CLAUDE.md cover what to remember about the user, how to handle the memory tool, when to provision the on-demand services like Resend and Neon Postgres, and how to use the scheduler MCP. None of that is teachable by listing dependencies in package.json. The stack-specific part of mk0r's project CLAUDE.md is roughly 4,500 bytes, an order of magnitude smaller than v0's full system prompt and an order of magnitude smaller still than the per-turn savings on the system param itself.

How do I verify the 287 character number myself?

Clone github.com/m13v/appmaker and run wc -c on the prompt literal in src/core/e2b.ts at line 170. The constant is DEFAULT_APP_BUILDER_SYSTEM_PROMPT. It is exported and used in two places in the same file, at the session creation paths around lines 2129 and 2316. The whole repo is open source, so you can grep for it and verify the value is not augmented at runtime.

Does Quick mode use the same prompt?

No. Quick mode streams HTML out of Claude Haiku 4.5 directly without an E2B sandbox, so it does not need the stack-teaching system prompt at all. The model just receives the user's sentence and a request to emit a single-file HTML app. The predefined stack story here is the VM mode, which is where the Vite plus React plus Tailwind sandbox lives.

More on how mk0r's inference economics actually work.

The 287 characters that save you a penny a turn

Show me the prompt

Why the prompt can be that small

What flows where

Stack lives in the image, not in tokens

The cost gap, turn by turn

System prompt re-billed every turn

The output side, which actually matters more

Where this gets honest

A back-of-the-napkin per-session number

Want to see the 287 characters in action?

Common questions

Frequently asked questions

Related reading

Vibe coding model switching cost

Vibe coding model comparison

AI app builder no-code

Comments ()

Show me the prompt

Why the prompt can be that small

What flows where

Stack lives in the image, not in tokens

The cost gap, turn by turn

System prompt re-billed every turn

The output side, which actually matters more

Where this gets honest

A back-of-the-napkin per-session number

Want to see the 287 characters in action?

Common questions

Frequently asked questions

Related reading

Vibe coding model switching cost

Vibe coding model comparison

AI app builder no-code

Comments (••)

Comments ()