Workflow comparison

Spec-Kit vs vibe coding workflow: the edge cases every guide skips.

The framing you keep reading is "vibe = fast and risky, spec = slow and safe." That is wrong about both ends. They produce different deliverables (markdown files vs a running app and a git SHA), so they answer different questions, so the real question is which question you are asking right now.

Matthew Diakonov, Written with AI

Published May 9, 202611 min

Direct answer, verified 2026-05-09

Spec-Kit (github.com/github/spec-kit) is a five-command pipeline (/speckit.constitution, /speckit.specify, /speckit.plan, /speckit.tasks, /speckit.implement) that produces seven or more markdown files (constitution.md, spec.md, plan.md, data-model.md, api-spec.json, research.md, quickstart.md, tasks.md) before any line of code is written. Use it when you are in the build phase of a known requirement and a multi-person team needs a written contract.

Vibe coding the way mk0r does it is one sentence per turn, one running app at a live URL, one git SHA per prompt (commitTurn() at src/core/e2b.ts:1759). Use it when the question is "is this idea even worth specifying" and you need a clickable artifact to find out.

They are sequential phases of the same project, not competing methodologies. The rest of this page is the edge cases that decide which one fits the turn you are on right now.

The frame everyone repeats, and what it gets wrong

Most articles on this topic say roughly the same thing. Vibe coding is fast but undisciplined; spec-driven is slow but durable; vibe is for prototypes, spec is for production; teams should "graduate" from one to the other. That is not wrong on average, but it hides three things that decide the actual answer for the turn you are on.

First, the deliverables are different categories of thing. A Spec-Kit phase produces text. An mk0r turn produces a running app at a URL plus a git SHA. Text and a URL are not interchangeable. If your next decision needs a clickable artifact, the markdown will not help; if your next decision needs a contract two humans can argue on, the SHA will not help.

Second, the relevant axis is not discipline, it is decision phase. Discovery ("is this worth building?") and build ("what exactly does this do?") are different questions. The right tool depends on which question you are asking, not on how serious you are.

Third, vibe coding is not non-spec. mk0r runs a 2,438-line spec on every prompt; you just did not write it. The disagreement is about who maintains the spec, not whether one exists.

What each workflow actually puts on disk

The cleanest way to see the difference is to count the artifacts each workflow produces for the same starter prompt: "build me a habit tracker."

Same prompt, two ledgers of output

# After /speckit.constitution

.specify/memory/constitution.md

# After /speckit.specify

.specify/specs/habit-tracker/spec.md

# After /speckit.plan

.specify/specs/habit-tracker/plan.md
.specify/specs/habit-tracker/data-model.md
.specify/specs/habit-tracker/contracts/api-spec.json
.specify/specs/habit-tracker/research.md
.specify/specs/habit-tracker/quickstart.md

# After /speckit.tasks

.specify/specs/habit-tracker/tasks.md

# Files of running code so far: 0

39% entries on the ledger

Neither column is the "right" answer in the abstract. The left is what you want before three people need to agree on what is being built. The right is what you want before you know whether the thing should exist.

Toggle between the two workflows

Same starter prompt, two pipelines, two definitions of "done with turn one."

Same prompt, different turn-one definitions

Phase 1, /speckit.constitution -> writes .specify/memory/constitution.md (project rules) Phase 2, /speckit.specify -> writes spec.md (user stories, acceptance criteria) -> stop. review with stakeholders. Phase 3, /speckit.plan -> writes plan.md, data-model.md, api-spec.json, research.md, quickstart.md -> stop. review the architecture. Phase 4, /speckit.tasks -> writes tasks.md (ordered, dependency-aware) Phase 5, /speckit.implement -> agent works through tasks.md, writes code Total artifacts before first line of code: 7+ markdown files. Total review gates: at least 2 (after spec, after plan).

Zero running code at the end of phases 1-4
Two explicit human review gates before /implement
Wrong tool if you do not yet know what you are building

The anchor fact: one SHA per turn, on every prompt

mk0r's history stack is not metaphor. The function commitTurn() at src/core/e2b.ts:1759 runs the literal three-line shell script git add -A, git commit -q -m '…', git rev-parse HEAD inside the sandbox VM after every successful prompt. The SHA goes onto a per-session historyStack, with undoTurn(), redoTurn(), and jumpToSha() as siblings in the same file.

1 SHA / turn

“Every vibe-coded prompt is a SHA you can revert to. The deliverable is git history, not a markdown spec.”

src/core/e2b.ts line 1759, commitTurn()

That is the structural difference, in one sentence. Spec-Kit's deliverable is markdown a human reads and reviews. mk0r's deliverable is a SHA a human (or another agent) can check out and run. Both are real artifacts; neither is a substitute for the other.

The surprise: mk0r is also spec-driven

The vibe-vs-spec writeups talk as if the choice is between writing a spec and not writing one. That is not the choice on offer.

mk0r runs a 2,438-line spec on every prompt. It lives in the public repo at src/core/vm-claude-md.ts, and the platform writes it into /root/.claude/CLAUDE.md inside the sandbox VM. The agent reads it before it touches a file. The contents include code-quality rules, styling rules, design constraints, copywriting rules, browser-testing rules, file-organization rules, env-var pitfalls, and VM guardrails. None of that is invented per session; it is platform-maintained, versioned, applied to every project built on the product.

The spec exists either way; only the author changes.

The agent is going to read a spec on every turn. The only question is whether you wrote one for this project (Spec-Kit lane) or whether the platform wrote one for every project (mk0r lane). For prototypes, the platform spec is the right level. For production code your team has to maintain, a project-level spec on top of (or instead of) the platform one is.

The edge cases each workflow misses

Below are six situations where the "vibe for prototypes, spec for production" rule of thumb is not enough. For each one the choice is not about discipline; it is about which deliverable the next person in the loop needs.

You are alone, idea unproven

You have a half-formed idea on a Sunday. Spec-Kit's first instruction is to write a constitution. mk0r's first instruction is type a sentence. Spec-Kit assumes the idea is worth a spec; mk0r helps you find out if it is. Use mk0r here. Promote to Spec-Kit only if the prototype earns it.

You and three coworkers must agree on scope

PM wants checkout, eng wants observability, design wants the empty state. Spec-Kit's spec.md is an artifact those three can argue on without anyone writing code. mk0r's git history is not a contract; nobody reviews diffs to align on scope. Use Spec-Kit here.

QA needs acceptance criteria

If QA writes tests against intent, intent has to be written down. Spec-Kit's acceptance criteria live in spec.md and survive the next refactor. mk0r commits a SHA per turn but a SHA does not say what the turn was supposed to do. Use Spec-Kit, or layer a spec on top of mk0r.

Stakeholder demo on Friday

A non-engineer needs to click on something and react. Five markdown files are not clickable. mk0r's per-turn SHA produces a live URL the stakeholder can open on their phone. Use mk0r, then specify the parts they liked.

Regulated payment flow

Audit, compliance, multi-team review. The deliverable has to be readable by people who do not run the code. Spec-Kit's contracts/api-spec.json is exactly the artifact regulators want. mk0r alone will not get you through review.

You want to validate the visual feel

Typography, spacing, the way the streak ring animates. No spec captures this. mk0r's Playwright MCP screenshots the rendered page on every turn; you iterate on what you see. Spec-Kit's discipline does not help with vibe-level taste decisions.

How to actually sequence them on a real project

Treat them as phases, not alternatives. The honest sequence for most projects is:

Vibe code the discovery phase. Open mk0r, type a sentence, see what comes back. Iterate by talking. The deliverable here is a per-turn SHA stack you can fork from, and a working URL you can show a friend or a customer to find out if the idea is real. Do not write a constitution.md for an idea you have not validated.
Promote to a spec when more than one human is involved. The moment a PM, a designer, or a second engineer needs to agree on what to build, the deliverable shifts from "clickable artifact" to "reviewable contract." That is when Spec-Kit (or any spec workflow) earns its overhead. The cost of writing the spec is now amortized over multiple humans reviewing it.
Use the vibe artifact as input to the spec. The mk0r prototype is not throwaway in this sequence. It is the most concrete possible input to /speckit.specify: an agent reading the prototype can extract user stories from working code, where it would otherwise have to invent them from a paragraph of text. Discovery feeds build.
Run implementation under whichever loop has better verification. For UI work, mk0r's Playwright MCP screenshots the rendered page on every turn; for backend work, Spec-Kit's tasks.md gives the agent a checklist that maps to tests. Pick the loop whose verification step matches the work.

The article that frames this as a binary choice is selling either Spec-Kit or vibe coding, not advising you. Both tools survive the year because they answer different questions.

One-file tour, if you want to verify the mk0r side

src/core/e2b.ts line 1759: commitTurn(). The shell script that writes a SHA per prompt. This is the "every turn is a commit" claim, in 50 lines.
src/core/e2b.ts line 1855 onward: undoTurn, redoTurn, jumpToSha. The history stack the SHAs go onto.
src/core/vm-claude-md.ts: the 2,438-line platform spec the agent reads on every turn. This is the "mk0r is also spec-driven" claim, in one file.
src/core/e2b.ts line 170: DEFAULT_APP_BUILDER_SYSTEM_PROMPT. The minimal system prompt that points the agent at the CLAUDE.md files for everything else.
src/core/e2b.ts line 175: buildMcpServersConfig(). The Playwright MCP wiring that gives the agent a real browser to verify in. This is the verification layer Spec-Kit does not include out of the box.
For the Spec-Kit side: the canonical reference is github.com/github/spec-kit and the docs at github.github.com/spec-kit. The five slash commands and the artifacts each one writes are documented there.

Try the discovery side. No account, one sentence, one running app per turn.

Open mk0r

Want to see the spec layer of mk0r live?

Book 20 minutes. We will start a session, open the in-VM CLAUDE.md, and show the agent reading it on every prompt while you watch the SHA stack grow.

Frequently asked questions

When should I actually reach for Spec-Kit instead of vibe coding?

When more than one person has to agree on what is being built before any code exists. Spec-Kit's pipeline produces a constitution.md (project rules), spec.md (user stories and acceptance criteria), plan.md (tech approach), and tasks.md (ordered work units), then runs implement. A product manager can review spec.md without reading any code, an architect can argue with plan.md, QA can pull acceptance criteria off it. That review-the-spec, not-the-code workflow is what Spec-Kit was designed for. If you are alone at midnight trying to find out whether an idea is interesting, you are not the audience.

When does Spec-Kit fail and vibe coding wins?

Discovery. If you do not know yet whether your idea is worth implementing, writing a spec for it is premature optimization. The spec assumes you already know what to build. With mk0r you type one sentence, watch a real app render, and the next decision is informed by something you can click rather than something you have to imagine. Five minutes in, you know whether the concept is even interesting. If it is not, you have not invested an afternoon in writing constitution.md and spec.md and tasks.md for an idea you will throw away.

Is mk0r doing 'no spec' coding then?

No, and this is the part the vibe-vs-spec articles miss. mk0r runs a 2,438-line spec on every prompt you send. It lives at src/core/vm-claude-md.ts in the public repo (github.com/m13v/appmaker). The model reads it before it touches a file. It covers code quality, styling, design constraints, copywriting, browser testing, file organization, env-var pitfalls, VM guardrails. The difference from Spec-Kit is who wrote the spec. With Spec-Kit, you write a spec for one project. With mk0r, the platform maintains a spec that applies to every project you build on it.

What is the actual deliverable difference per workflow turn?

A Spec-Kit turn produces text artifacts: a markdown file gets created or updated. You can read it, review it, version it. There is no running software at the end of a /speckit.specify call. An mk0r turn produces a running app and a git SHA. The commitTurn() function at src/core/e2b.ts line 1759 runs `git add -A && git commit && git rev-parse HEAD` after every prompt. The SHA goes on a history stack. You can undo, redo, jump to any earlier SHA, fork from any point. The deliverable is a clickable URL plus a git history, not a pile of markdown.

Can I get the best of both? Spec-Kit-style discipline with mk0r's speed?

Sort of. Inside an mk0r session you can ask the agent to write a spec.md before changing code. The agent has a real Vite project on disk, a Playwright MCP for browser verification, and a 2,438-line CLAUDE.md that already tells it to plan before editing. Asking it 'write a markdown spec for the next feature, then implement' is one prompt away. What you do not get is Spec-Kit's separation of phases, where /speckit.specify and /speckit.implement are different commands that gate each other. mk0r runs the whole loop in one turn. If gated phases are the value you want, use Spec-Kit.

Does spec-driven actually take longer in real usage?

It depends on whether you would have written the spec anyway. For a feature you would have written a Jira ticket and a design doc for, Spec-Kit replaces those with markdown that an agent can also read, so the net cost is similar and the implementation is more accurate. For a one-screen prototype you would have hacked together in 15 minutes, Spec-Kit's five-phase pipeline is pure overhead. The honest answer is that the speed comparison depends on what you would have done in the absence of either tool.

Where does Spec-Kit hit its own wall?

The spec is not the implementation. Detailed acceptance criteria can create the illusion that the agent will hit them; in practice you still have to run the code, click around, and check. Spec-Kit's /speckit.implement step does the build, but it does not include a real browser-verification loop the way mk0r's Playwright MCP does. You can layer one on, but you have to bring it. Out of the box, Spec-Kit's last mile is 'agent says it implemented the tasks,' not 'agent took a screenshot of the working page.'

Where does mk0r-style vibe coding hit its own wall?

Anything that needs a written contract between two parties. A regulated payment flow, a shared API your other team consumes, a feature whose acceptance criteria QA needs to write tests against. mk0r commits a SHA per turn, but a SHA is not a spec. A future maintainer can read the diff but cannot read the intent. If the artifact has to outlive a single weekend's worth of attention, the absence of an explicit, human-reviewed spec eventually bites. That is the case for adding Spec-Kit, not for replacing your prototyping tool.

What is the one-line answer for my team?

If you do not yet know what to build, vibe code on mk0r until the idea is real enough to specify. Once you know what to build and more than one person needs to agree on it, switch to Spec-Kit (or a written spec produced however you like) before you scale the implementation. The two workflows are sequential phases of the same project, not competing methodologies you have to pick between.