Argument

The honest way to compare vibe coding tools is to read their agent’s rulebook

Every list of the best vibe coding tools ranks the same five things: signup friction, free tier, full-stack reach, framework support, and price per month. All five matter. None of them explain why a generated app on tool A looks indistinguishable from a generated app on tool B. The answer to that is hiding one layer down, in a file most tools never let you see, and it is the only thing that actually changes what your app looks like when you ship it.

Matthew Diakonov, Written with AI

Published April 28, 202611 min

The thesis, in one paragraph

Most vibe coding tools sit on top of roughly the same small set of frontier models. The model is the part of the stack that gets the press, but on a per-output basis the model is the constant, not the variable. The variable is the agent’s system prompt, the project-level CLAUDE.md, and the skill files the agent reads before it writes a single character. Those three artifacts decide what the agent considers a default font, a default accent color, a default layout, a default empty-state copy pattern. Change those, and the output shape changes. Leave them blank, and the model defaults to whatever it has been most rewarded for during training, which is the Inter-headline-and-violet-button look you have now seen on ten thousand AI generated landing pages.

That is the entire reason this page exists. The right comparison criterion for a vibe coding tool is not a checklist of features. It is: what does the agent read on every turn, and is that text any good. The rest is noise.

The actual lever, drawn out

Here is the path a single user prompt walks before the agent starts writing. The model is the bottom of the stack. The rules are the top.

What the agent sees on every turn

User prompt

the one sentence you typed

System prompt

set by the platform

Project CLAUDE.md

lives in the project root

Skill files

loaded on demand

Frontier model

writes the code

Three of those five steps are platform decisions. You only control the first one, the user prompt, and only sometimes the third, the project CLAUDE.md. The platform owns the system prompt. The platform also owns whatever skill files the agent has access to. So when you compare tools, you are comparing the platform’s taste, not just its model choice.

What is in mk0r’s rulebook, line for line

I am going to put my own cards on the table first because the whole argument depends on you being able to inspect the rules I am defending. The two files that ride into every mk0r sandbox are docker/e2b/files/root/.claude/CLAUDE.md and docker/e2b/files/root/.claude/skills/frontend-design/SKILL.md. Both are open source under github.com/m13v/appmaker.

Three rules in CLAUDE.md carry most of the weight on the look-and-feel question:

docker/e2b/files/root/.claude/CLAUDE.md

Read those again slowly. The rule at line 84 collapses the color choices the agent will reach for from a palette of dozens to a palette of three. The rule at line 126 to 127 rules out the most recognizable AI font signature on the open web. The list at line 152 names, by name, the single most common AI app maker output pattern: hero section centered, gradient button, purple-to-indigo, soft shadow, rounded card grid below. That output is a habit, not a choice, and habits are killed by writing them down as anti-patterns and feeding them to the agent on every turn.

How those rules actually reach the agent

Rules in a Markdown file are not magic. They only matter if the agent reads them. Here is how the file gets onto disk inside the sandbox the agent runs in.

Build-time, not runtime

# docker/e2b/e2b.Dockerfile, around line 89

# ── Claude Code config at /root/.claude ──
COPY files/root/.claude/ /root/.claude/

# Result: every sandbox the agent boots into has the
# rulebook on disk before it answers your first prompt.

Because the COPY happens at template build time, the rules are baked into the immutable image. There is no runtime path that could skip them. Every fresh sandbox wakes up with the rulebook on disk before the first user prompt arrives. That is what makes them load bearing instead of decorative.

The five point checklist you can run on any tool

You do not need insider access to apply this lens. Here is a test you can run on any vibe coding tool today, including mk0r. The point is not that mk0r passes all five. The point is that almost every other roundup ignores all five.

Is the system prompt visible. Can you read what the agent has been told before you opened the chat. If the answer is no, you are renting the platform’s taste sight unseen.
Is there a project rules file the agent reads on every turn. CLAUDE.md, AGENTS.md, .cursorrules, agent.md. Different names, same idea. If there is none, you cannot enforce taste across multiple turns.
Does the rulebook name anti-patterns by name. Generic guidance like “produce beautiful code” is decorative. Specific bans (no Inter, no purple-on-white, three colors max) are load bearing.
Are the rules in source control where you can read them. A platform that publishes its agent’s rulebook is making a claim it has to live with. A platform that hides it is asking you to trust the marketing copy instead.
Does the output actually obey the rules. This is the empirical leg. Build the same prompt on three tools. The fonts, colors, and section shapes you get back are the de facto system prompt, whatever the marketing says.

The same prompt, on a tool with rules and a tool without

To make the difference legible, here is the kind of output you can expect when the agent has no design rulebook in the loop versus when it does. The user prompt is identical in both cases: “a one page habit tracker, single daily habit, light theme, no auth, no database.”

Same prompt, different rulebooks

The agent reaches for its training defaults: Inter for the headline, a violet to indigo gradient on the primary button, three lightly tinted accent colors scattered across cards, a centered hero with a gradient title, and a uniform grid of rounded cards below. The result is competent and totally generic. You could swap the headline copy and it would pass for any of a hundred other AI generated landing pages.

Inter or a system sans for the headline
Violet-to-indigo gradient on the CTA
Centered hero, gradient title, soft shadow
Three or four lightly tinted accent colors
Uniform card grid with decorative emoji

The grid: what most lists rank vs. what actually changes the output

Every existing best-of roundup grades vibe coding tools on features. The features matter. They are not what changes the shape of your output. Here is the side by side.

Feature	What most roundups rank	What actually changes the output
Signup friction and free tier	Top of every roundup	Real, but does not affect what the generated app looks like
Full-stack vs. frontend-only support	Heavy weight in every roundup	Real, but a backend stub still inherits AI defaults if rules are absent
Framework breadth (React, Vue, Svelte)	Treated as a major axis	Almost cosmetic, every framework converges on the same AI defaults without rules
Per-month price	Always graded	Independent variable, does not affect taste
Visibility of the agent's system prompt	Almost never mentioned	Decides whether you can see what taste you are renting
Project-level CLAUDE.md / rules file	Almost never mentioned	Decides whether your taste survives more than one turn
Named anti-patterns in the rulebook	Never mentioned	The single biggest reason output A looks different from output B
Open source rulebook in a public repo	Never mentioned	The only way to verify the marketing claim before committing a project

The counterargument I want to take seriously

The reasonable pushback on all of this is: the model is the model, and a strong enough model will produce strong output regardless of what you put in the system prompt. There is something to that. A frontier model is much harder to steer into AI slop than a weaker one. But the empirical evidence on the open web cuts the other way. The most common complaint about AI generated apps in 2026 is not that the code is wrong; it is that everything looks the same. That uniformity is the model defaulting, not the model failing. A rulebook in front of the model breaks the default.

The other reasonable pushback is: rules constrain creativity. They do. The honest tradeoff is that without them, the output is creative on a single dimension (the words you typed) and absolutely conformist on every other dimension. With them, the output is constrained on the dimensions where the model would otherwise default to conformity, and free on the dimensions you actually care about. That is the trade I think most makers want, and it is the trade mk0r’s rulebook makes on your behalf.

“The agent's CLAUDE.md is the most underrated config file in any vibe coding tool. Show me yours and I can tell you what your output is going to look like, before I see a single screenshot.”

A maker friend, paraphrased

said over coffee, after we tried four tools in one afternoon

Where mk0r lands on the checklist

To be fair, here is mk0r against the same five point test I said you should run on any tool, including this one. I am marking the cells honestly. Some of these are stronger than others.

Question	mk0r
Is the system prompt visible	Yes, it is a literal string in `src/core/e2b.ts`, `DEFAULT_APP_BUILDER_SYSTEM_PROMPT`
Project rules file the agent reads on every turn	Yes, two of them: `/app/CLAUDE.md` and `/root/.claude/CLAUDE.md`
Anti-patterns named by name	Yes, line 152 of CLAUDE.md spells out the four most common AI defaults
Rules in a public repo	Yes, github.com/m13v/appmaker
Output obeys the rules	Mostly. Strong enough prompts override the rules, which is the right escape hatch but also a real failure mode if you forget to set them.

The honest take on the broader landscape

mk0r is not the right pick for everyone. If your work flows through an IDE and you want an AI pair sitting in your editor, Cursor or Windsurf will fit better than any of the web app builders. If your project is a back-office tool that needs role-based access, a relational schema, and form-driven CRUD pages, a tool that ships those primitives will save you weeks. If you are a non-developer who wants a full-stack app with the lightest possible learning curve, Lovable, Bolt, or Replit Agent are all reasonable picks with their own tradeoffs.

The argument here is narrower. It is that whichever tool you pick, the most predictive single thing about what your generated app will look like is what the agent reads on every turn. Most lists will not tell you that, because opening the rulebook is more work than counting features. The work is worth doing. It is also the only way to know why two tools using the same model produce wildly different output, and why a third tool ranks last on every feature chart but produces the only output that does not look like everyone else’s.

One small experiment to try this week

Pick three vibe coding tools you are considering. Open all three in different tabs. Type the same prompt into each: a single page habit tracker, one daily habit, light theme, no auth, no database. Do not say anything about fonts, colors, or layout. Screenshot the first preview each tool produces. Lay the three screenshots side by side.

Two of them will look like siblings. The fonts will read as Inter. The CTA will be a violet or indigo gradient. The layout will be a centered hero on top of a uniform card grid. That is the AI default coming through unfiltered. The third, if you are lucky, will be visibly different. If none of the three is different, none of the three has a rulebook in the loop, and the right answer is to keep looking.

Then run the same prompt on mk0r with no signup. The defaults you get back are not perfect. They are obviously not Inter, not a violet gradient, and not a centered hero with a card grid below. That is the rulebook earning its keep, line by line.

Want to read the rulebook live and watch the agent obey it?

On a quick call we will open CLAUDE.md, type one prompt, and watch the agent ignore Inter and skip the violet gradient in real time. Bring an app idea and we will run the same prompt against three other tools side by side.

Frequently asked questions

Why is the agent's CLAUDE.md the right thing to look at when picking a vibe coding tool?

Because once you strip away the marketing, almost every modern vibe coding tool is the same underlying frontier model writing React, Tailwind, and a small backend stub. The model is roughly fixed. What is not fixed is the system prompt, the project-level CLAUDE.md, and the skill files the agent reads before it writes a line. Those three artifacts decide what the agent considers a default font, a default color, a default layout. Two tools wired to the same model produce wildly different output if one of them tells the agent to never use Inter and the other says nothing. So when you compare tools, the honest comparison is: what do their agents read on every turn. Most lists never go there.

What does mk0r's agent actually read inside the sandbox?

Every E2B sandbox mk0r boots ships with a `/root/.claude/CLAUDE.md` and a `/root/.claude/skills/frontend-design/SKILL.md`. The Dockerfile that builds the template is `docker/e2b/e2b.Dockerfile` and the COPY directives that put those files in place are right around line 89 to 90. They are not optional. They are not user-configurable. Every prompt the agent receives is preceded by the rules from those files. The CLAUDE.md is roughly three hundred lines and it locks down color palette, fonts, anti-patterns, copywriting style, and browser testing workflow. The SKILL.md adds another forty lines specifically about avoiding generic AI aesthetics.

What rules in there matter most for the look-and-feel question?

Three of them carry most of the weight. First, line 84: 'Three colors maximum: black, white, and one dominant accent color.' That single rule kills the default behavior where AI assistants paint every section a different shade of teal, indigo, and rose. Second, line 126 to 127: 'Never default to Inter, Roboto, Arial, or system fonts. These are the hallmark of generic AI output.' That kills the most recognizable AI signature on the open web. Third, line 152: 'Purple/indigo gradients on white backgrounds' is listed under Anti-Patterns, which means the agent has been explicitly told the most cliched move in the AI app maker world is off the table. None of those rules require model gymnastics. They are just text that gets read on every turn, and they change the output shape because the agent uses them as defaults instead of inventing its own.

Could I not just write my own system prompt with another vibe coding tool to get the same effect?

Sometimes yes, sometimes no. A few tools expose a project-level prompt or a 'rules' file you can edit. Others bake their system prompt into the platform and do not let you override it; you can ask the agent to follow your style in the chat, but that only lasts one turn before the platform's defaults reassert themselves. The honest test is: does the tool let you put a file in the project that the agent reads on every turn, the way mk0r reads `/app/CLAUDE.md` and `/root/.claude/CLAUDE.md` from inside the sandbox. If the answer is no, you are renting their taste, not yours.

How do I check what any tool is feeding its agent without insider access?

Build the same tiny app on three different tools using the same prompt: 'a single page habit tracker, just one daily habit, light theme, no auth, no database.' Do not specify fonts, colors, or layout. Look at the raw output. If two tools produce a sans-serif headline that visually reads as Inter and a primary CTA that is a violet to indigo gradient, both of those tools have the AI default coming through. If one tool produces something with a real display font, a single accent color, and asymmetric whitespace, that tool has rules in the loop. You do not need to see the system prompt directly. The output is the system prompt, in effect.

Is there a tradeoff to having a strict design rulebook in the agent?

Yes, two of them. The first is that any rule the agent obeys is a rule it cannot break, even when breaking it would be right. If your brand is famously purple and your accent color in the prompt is purple, an agent told 'no purple gradients on white' might still resist the gradient where you wanted one. The second is that the rules tilt toward minimalism by default. If you actually want a maximalist landing page with five accent colors and a neon sticker pack, you have to override more aggressively. Worth knowing. The reason mk0r picked these defaults anyway is that the failure mode without them is much worse: every output looks like every other AI output, and your project loses its only chance at visual identity.

Where is the rulebook in the repo if I want to read it before I trust it?

Public on GitHub at github.com/m13v/appmaker. The two files are `docker/e2b/files/root/.claude/CLAUDE.md` and `docker/e2b/files/root/.claude/skills/frontend-design/SKILL.md`. The Dockerfile that copies them into the sandbox is `docker/e2b/e2b.Dockerfile`. Read all three. They are short. They are also the actual contract you are buying when you use mk0r, so it is fair to want to see them before you commit a real project to the platform.

Does this mean mk0r is the right pick for everyone shopping the best vibe coding tools list?

No. If your project is a pro-developer IDE workflow, Cursor or Windsurf will fit better. If you need a heavy back-office app with role-based auth and a relational data model, a tool that ships with those primitives is a better bet. mk0r is the right pick when you want a real, openable, mobile-first app from a single sentence, with the agent constrained against the most common AI design tells, and you are willing to read the rulebook to understand what the agent will and will not do. That is a narrower bet, on purpose. The rest of this guide gives you the test you can apply to whichever tool you pick, including this one.

Open the rulebook before you open the editor. The output you get is the rulebook in effect.

Build with the rulebook