Guide

Is vibe coding bad? Only when the agent has no rulebook.

Every article answering this question lists the same four risks (leaked secrets, generic UI, unmaintainable code, you do not understand what you shipped). None of them open the file. The answer that survives contact with a real product is uncomfortable for both sides: vibe coding is genuinely bad in one specific shape, and that shape disappears the moment a written contract loads before your first prompt.

Matthew Diakonov, Written with AI

Published April 30, 20269 min

Direct answer (verified 2026-04-30)

Vibe coding is not inherently bad, and it is not categorically safe either. It becomes bad in a single, predictable shape: when the agent runs with no written rulebook to read first. Most cited risks (security holes, leaked keys, generic UI, code you cannot maintain) trace back to that missing contract, not to the practice of describing software in plain English. With a rulebook the agent loads before your first prompt, most named risks have a named, quotable countermeasure. mk0r ships 2,354 lines of one in src/core/vm-claude-md.ts, in the public github.com/m13v/appmaker repo. You can open it.

Is vibe coding bad?

Read the file the agent reads

Same model. Same prompt. Different rulebook.

No rulebook: leaked keys, Inter on purple, exclamation points

Rulebook: env trim, no Inter, no purple, no exclamation points

2,354 lines, 15 exports, 6 skills, public repo

0:00 / 0:05

What the loudest critics actually argue

Strip the headlines back and the case against vibe coding always comes from one of four directions. They are worth restating faithfully, because the sloppy version of this debate is a defender saying "but it's a tool" and a critic saying "but the code is wrong." Both sides win that argument without learning anything.

The Veracode and academic studies (summarised by Trend Micro and others in early 2026) put the rate of security vulnerabilities in AI-generated code somewhere in the 40 to 62 percent band depending on what counts as a vulnerability. Trend Micro's own write-up cites a vibe-coded launch that leaked roughly 1.5 million API keys before anyone noticed. The Wits University op-ed in March 2026 pinned 35 new CVEs that month to AI-generated code. A January 2026 paper called "Vibe Coding Kills Open Source" argued that the practice quietly suppresses interaction with maintainers. And a frequently-cited Linus Torvalds remark from early 2026 sums up the cultural objection: fine for learning, horrible for maintenance.

Every one of these critiques is correct about something specific, and almost every one of them is measuring a base LLM with no wrapper, or a vibe coding tool whose contract is "do whatever the prompt says." That is the cold-prompt failure mode. It is real. It is also not what a vibe coding session looks like inside a product with guardrails.

“I think it's a fine tool to learn things, but I think it's a horrible idea to use it as a basis for stuff that you actually need to maintain.”

Linus Torvalds

On vibe coding, early 2026

The pattern under every named complaint

Read the four critiques back-to-back and the same shape shows up. Each of them describes behavior the model exhibited because nothing told it not to. A leaked API key is not a property of LLMs; it is a property of an agent that was never told to trim env vars and never told to check for pre-provisioned services first. Inter on a purple gradient is not a property of LLMs; it is what falls out when no font rule and no color rule load before the prompt does.

This is not a defence of vibe coding. It is a sharper version of the criticism: the criticism is correct, and the answer is not "humans review every line." The answer is "the rulebook reaches the agent before the prompt does." A human can review every line and still ship a leaked key, because by the time it's in the diff it's already in the file. A rulebook can stop the agent from writing the bad pattern in the first place.

Six complaints, six rules they map to

Open src/core/vm-claude-md.ts on github.com/m13v/appmaker and search for any of these phrases. Every cell below is a verbatim or near-verbatim rule the agent loads at boot, sourced from globalClaudeMd, projectClaudeMd, or one of the six skill exports.

Complaint: leaked API keys

Rule: 'check /app/.env for pre-provisioned services before creating new accounts or asking the user for API keys.' PostHog, Neon, Resend, and a GitHub repo are already wired. Plus an explicit 'Common Pitfall: Trailing Newlines in Environment Variables' section that tells the agent to trim every value before writing it.

Complaint: AI-slop UI

Rule (frontend-design skill): 'Never default to Inter, Roboto, Arial, or system fonts. These are the hallmark of generic AI output.' Plus: no purple/indigo gradients on white, no uniform rounded card grids, no 'centered hero text and a gradient button.'

Complaint: marketing buzzwords

Rule (copywriting skill): no exclamation points ever, and a banned-word list that includes streamline, optimize, innovative, cutting-edge, revolutionary, seamless. Plus filler bans: very, really, just, actually, basically, simply.

Complaint: agent touches infra

Rule (VM Guardrails): do not modify /opt/*.js or /opt/startup.sh, do not kill or restart Chromium, Xvfb, x11vnc, or the proxy, do not change Vite's HMR config, do not write outside /app unless the user asks.

Complaint: speculative complexity

Rule (Code Quality): keep components small (extract at ~100 lines), no speculative abstractions or feature flags, handle errors at boundaries not internally. Prefer const, never var. Comments only on non-obvious logic.

Complaint: AI forgets decisions

Rule (Memory): save a memory the moment the user names themselves, expresses a preference, picks a colour, or rejects a direction. Read memories at the start of each turn. Every session is a real git repo, every turn becomes a commit.

Watch the same prompt produce two different sessions

A useful way to feel the difference is to imagine one prompt ("build me a waitlist site that captures emails") running against two identical models. One has the rulebook loaded. The other does not. The session traces below are not a benchmark, they are descriptions of what each contract pulls out of the same model on the same request.

Same prompt. Different contract.

Agent picks Inter and a purple gradient. Generates a Mongo connection string locally and pastes the password into a .env with a trailing newline. Centered hero, three rounded cards with emoji icons, an exclamation point in the CTA. The waitlist form posts to a stub /api/subscribe handler that logs the email to console. The agent reports 'all set' and stops.

Inter on purple gradient
Newline-corrupted secret in .env
Stub backend, emails go to console
Exclamation points in CTAs
No memory of design decisions

The contract you can count

People are reasonably suspicious of "we have guardrails." Most products that say it have a couple of paragraphs in a config file. Counting the actual size of the contract is a fair sniff test, because a contract big enough to address the named risks is big enough to read.

0Lines in vm-claude-md.ts

0Exported constants

0Skills loaded at boot

0Pre-provisioned services

Where the criticism still holds

None of this should be read as a refutation of the people calling vibe coding bad. Their criticism stops applying inside a tool whose rulebook addresses the named risks. It does notstop applying everywhere else, and the "everywhere else" is larger than the rulebook is.

Production-grade auth flows that survive a real attacker. Real-time multi-user state with conflict resolution. Payments compliance. Native mobile with App Store review. Performance under realistic load. Anything where the failure mode is "a competent attacker spends a day on it." A vibe coding session, even with a careful rulebook, gets you a convincing first draft of these and then runs out of road. That is a real limit, not a marketing one.

Linus is also correct about the maintenance shape. If you accept every diff without reading it, you have not learned what the agent built. The rulebook will keep the code from being slop, but it cannot make you understand it. Reading the diff is still your job, and on a long-lived codebase that job is the difference between commissioning code and owning it.

The honest verdict

Is vibe coding bad? It is bad when nothing reaches the agent before the prompt does. It is fine when something does. The interesting question is not whether the practice is dangerous in the abstract; it is whether the specific tool you are using has a contract you can read, and whether that contract addresses the failure modes that worry you. If it does, vibe coding gets you to a working disposable prototype faster than any other workflow. If it does not, the criticism is exactly correct.

mk0r's file is open. Read it before you ship anything you would not throw away. Then decide.

Want to read the rulebook with someone who wrote it?

Book a 20 minute call. We open vm-claude-md.ts together, walk through the rules that match your use case, and run a session against the contract live.

Frequently asked questions

Is vibe coding bad?

Sometimes, in a specific way. The named risks (leaked secrets, unauthenticated backends, generic AI-slop UI, code nobody can maintain) almost always trace back to a missing rulebook, not to vibe coding itself. The same model that writes a working app will also leak API keys if nothing tells it not to. With a rulebook the agent loads first, most named risks have a quotable countermeasure. Without one, the criticism lands.

What is the rulebook on mk0r and where can I read it?

It is a single TypeScript module at src/core/vm-claude-md.ts in the public github.com/m13v/appmaker repo. The file is 2,354 lines, exports 15 constants, and gets copied into the sandbox before your first prompt. Two of the constants become CLAUDE.md files (one global, one per project) and six become SKILL.md files (frontend-design, copywriting, backend-services, algorithmic-art, website-builder, seo-page).

Why do half the AI-generated code studies report security vulnerabilities?

Because most of the studies measure cold prompts: ask a base LLM for code, run a scanner. That is not what a vibe coding session looks like in a tool with guardrails. The model is the same; the wrapper around it is what changes the failure rate. The fair test is 'agent plus rulebook plus runtime' against a code reviewer, not 'raw model' against a scanner.

What does mk0r's rulebook do about leaked API keys?

Two things. First, the project CLAUDE.md tells the agent to check /app/.env for pre-provisioned services (PostHog, Neon, Resend, GitHub) before asking for credentials, so most apps never need a user-supplied key. Second, the global rulebook has an explicit 'Common Pitfall: Trailing Newlines in Environment Variables' section that tells the agent to trim every value before writing it to .env (the same bug that ate a 100% Stripe webhook outage on a different project of mine).

What does it do about generic AI-slop UI?

The frontend-design skill names the slop directly. It bans Inter, Roboto, Arial, and system fonts as the default font. It bans purple/indigo gradients on white backgrounds. It bans uniform rounded cards in a grid with icons. It bans 'centered hero text and a gradient button.' Every one of those is in the file as a literal 'do not' rule, not a vibe.

What does it do about unmaintainable code?

The global rulebook prefers small components (extract at ~100 lines), forbids speculative abstractions and feature flags, and requires that errors be handled at boundaries (user input, fetch calls), not internally. Every session is a real git repo inside the VM and every turn becomes a commit, so 'AI does not remember decisions' fails for a different reason here: history is on disk.

Where is the criticism still correct?

Anywhere the rulebook does not reach. Production-grade auth, multi-user real-time state, complex backend logic, native mobile, payments compliance: a vibe coding tool will get you a convincing prototype and then leave you to do the engineering. Disposable first drafts and weekend projects are the safe zone. Anything you would put a customer's credit card in is not.

Is vibe coding worse for learning to code?

Linus Torvalds made the cleanest version of this point in early 2026: vibe coding is fine for learning, horrible for maintenance. If you are reading the agent's diff and reasoning about each change, you are still learning. If you are accepting whatever the agent ships and never opening the file, you are not learning to code, you are commissioning code. Both are legitimate uses; only one of them produces an engineer.

Should I vibe-code a startup?

The first prototype, yes. The version a customer pays for, no, unless an engineer reviews the file you are about to ship. The rulebook is a floor, not a ceiling. It catches the named foot-guns. It does not catch the unnamed ones, and a real product has unnamed ones.

Does mk0r need an account?

No. mk0r stores a session key in localStorage and the agent runs against a pre-warmed E2B sandbox. There is no signup, no payment screen, and no email capture before you build. The 'no account' part is also why the rulebook has to be airtight: there is no hidden onboarding step where a human could intervene.

Read the file. Then build something disposable against it. No signup.

Start a vibe coding session