Argument

AI CTO, vibe coding, and where spec clarity actually lives

The whole genre right now says: write a spec doc before you let the agent generate code. There is a second path. The CTO behavior can live inside the agent prompt itself.

Matthew Diakonov, Written with AI

Published May 6, 20267 min read

Direct answer (verified 2026-05-06)

Two ways to get spec clarity before vibe coding. Either the human writes a spec doc the agent consumes (Kiro, GitHub spec-kit, Tessl all push this), or the agent's system prompt forces the agent to interview the human for the spec before it generates any code. mk0r ships the second pattern in src/core/vm-claude-md.ts, lines 27 through 77.

Verified against InfoWorld's spec-driven vs vibe-coding framing and against the file path cited above on 2026-05-06.

I keep seeing the phrase "AI CTO" show up in vibe coding threads. The implication is usually that vibe coding does not scale, that you need an adult in the room with a spec, and that the responsible move is to write that spec yourself before the agent does anything. That advice is not wrong, exactly. But it is framing one path as the only path, and there is a second one sitting under most people's noses.

You can put the CTO behavior inside the agent. Not as a separate tool, not as a workflow you opt into, but as a few paragraphs of system prompt that turn the agent into the one asking spec questions. The output is the same: the human and the agent share a spec before code starts streaming. The path is different.

What the "agent interviews you" pattern looks like

Concretely, this is what the first thirty seconds of a fresh mk0r session looks like when the user prompt is sparse. The agent does not write code yet. It asks short, opinionated questions, saves the answers as session memory, and then writes a build informed by the conversation.

A sparse prompt becomes a spec, in one short loop

The thing to notice is the order. The agent does not generate code and then accept corrections, which is the default mode of every other AI builder I have used. It generates the spec first, by asking, and only after the spec is in memory does it touch the file the user is going to see.

The four questions, verbatim

This is the anchor fact. If you open the mk0r repo and read src/core/vm-claude-md.ts at lines 52 through 57, you will find this exact block of instructions injected into the agent's environment as CLAUDE.md before any of your prompts arrive:

- **Ask questions** when context would improve your work:
  - "What's the primary use case for this app?"
  - "Do you have a color scheme in mind, or should I pick something clean?"
  - "Is this for a technical audience or general consumers?"
  - "Should I keep it simple or go all-out on the design?"

Four questions. Plain English. They are not run as a chatbot wizard. They are part of the instruction the model reads at the start of every session, alongside the role description and the workflow. The model fires them only when the user prompt is underdetermined. A specific first prompt skips the interview; a vague one gets the questions.

What each question buys you

Primary use case

Asked verbatim in the prompt: "What's the primary use case for this app?" Forces the human to name an outcome, not a feature.

Color and visual direction

"Do you have a color scheme in mind, or should I pick something clean?" Cuts off the worst path: the model picking a generic purple gradient because nobody said otherwise.

Audience

"Is this for a technical audience or general consumers?" Decides density, jargon, and the default level of polish before the first <div> gets written.

Ambition level

"Should I keep it simple or go all-out on the design?" Locks the build budget. Without this, the model defaults to whatever it did last session, which is rarely what the user wanted.

Vague prompt vs interrogated spec

The difference between the two output paths is mostly invisible until you have one of each, side by side. Try toggling between these two examples. The vague prompt is what most people type. The interrogated spec is what the agent actually has to work with after the four questions land.

Same idea, two starting points

build me a habit tracker

agent guesses the audience
agent guesses the visual direction
agent guesses the ambition level
you correct three things in the next ten turns

Both end up at the same place if the human is patient. The interview just collapses the path. Fewer turns, less back-and-forth, less of the failure mode where the agent generates an opinionated UI and then has to walk it back.

Why this is not the same as a spec doc

Spec-driven tools (Kiro, GitHub spec-kit, Tessl, the playbook in most CTO blog posts) push the human to write a structured artifact up front. Requirements, plan, tasks. The agent then executes the artifact in passes. That works. It is also a different audience.

The interview pattern is for the moment when the human has an idea in their head and does not want to translate it into a spec document before they see anything running. The agent does the translation, in conversation, and the conversation itself becomes the spec. The cost is that the spec is ephemeral. It lives in session memory; if you want it durable, you save it back into a CLAUDE.md file the agent reads next time. Both patterns are valid. They compose. The interview is fast and disposable, the doc is slow and durable.

The CTO framing in the question genuinely fits both. A real CTO does some of each: catches the under-specified ideas with a quick chat, writes a doc when the idea is going to production. The interesting move is putting the "quick chat" layer inside the agent itself, so the human does not have to remember to do it.

What this pattern does not solve

Honest limit: the interview gets you spec clarity for the prototype layer. It does not give you spec clarity for production code, where the question is not "what does this do" but "will this scale, can it be reviewed, who owns it after the session ends". For that, you want a real spec doc, version control, code review, and the rest of the engineering apparatus that exists for good reason.

The Canva CTO line that gets quoted in every one of these threads, "you won't be vibe coding your way to production, not if you prioritize quality, safety, security and long-term maintainability at scale", is correct. The interview pattern does not contradict it. It just fixes the prototype layer, where the question is whether the idea is worth a real spec at all.

“Vibe coding and spec-driven development aren't competing approaches; they're complementary ones. Use vibe coding to explore and prototype, and spec-driven development with AI to harden and ship.”

Ayaz Ahmed Khan

Senior director of engineering, Cloudways by DigitalOcean (via InfoWorld)

The bet

If you are the one shipping prototypes alone, you do not need a human CTO and you probably do not need a spec doc either. You need a tool that asks you four good questions before it generates the wrong UI. That is the bet of the interview pattern: most of the value of a CTO at the prototype stage is in the questions, not in the artifact, and questions are a thing you can put in a system prompt.

If you want to read the actual file, it is github.com/m13v/appmaker, under src/core/vm-claude-md.ts. The repo is open. The interview block is not hidden behind a paywall, an enterprise tier, or a settings toggle. It is the default behavior, and you can fork it and change the questions.

Want a second pair of eyes on what to ask the agent?

Twenty minutes to look at the prompt you would put in front of an AI builder, and figure out which spec questions are actually load-bearing for your idea.

Frequently asked questions

What does an AI CTO actually do for vibe coding spec clarity?

An AI CTO, in the practical sense people are using the phrase, is whatever process forces the spec out of the human's head before code starts streaming. That can be a doc you write (Kiro, GitHub spec-kit, Tessl all push this), or it can be the agent itself running a short interview before it edits a file. mk0r picks the second path. The four interview questions live at src/core/vm-claude-md.ts lines 53 through 57 and run on every fresh session. The output of the interview is the spec; the human just answers in plain words.

Why not just write a real spec doc upfront?

You can. Spec-driven tools work, especially for production handoffs. The tradeoff is that writing a spec is itself a skill, and the audience for vibe coding is often someone who picked up the tool because they did not want to write specs in the first place. Embedding the interview in the agent shifts the cost from "learn to write a spec" to "answer four short questions". The downside is the spec is conversational and ephemeral; if you want it durable, save it as a CLAUDE.md memory the agent reads next time. Both patterns are valid and the honest answer is they compose well.

Where exactly is the CTO behavior in the mk0r source?

Open src/core/vm-claude-md.ts. The exported globalClaudeMd string starts at line 19 and runs to about line 240. Two blocks matter for spec clarity. First, lines 27 through 50 are the "Learning About the User (Memory)" block: it tells the agent to save corrections and preferences as memories the moment they appear. Second, lines 52 through 64 are the "How to Learn" block, which contains the four literal questions to ask when context would improve the work. That file is injected into the agent's environment at /root/.claude/CLAUDE.md before any user prompt is processed.

Does this slow down the maker loop?

It adds two or three turns to the start of a session. For a one-screen prototype, that is maybe thirty seconds of typing in exchange for the agent not picking a purple gradient and a generic dashboard layout you did not ask for. For tight, throwaway prototypes the questions are skippable; the agent will only ask if the prompt is genuinely underdetermined. The system prompt at src/core/e2b.ts line 170 is one sentence on purpose, so the model still leans on the user prompt as the main input. The interview kicks in when the user prompt does not carry enough signal.

Is this the same thing as Kiro or GitHub spec-kit?

No, it is the inverse pattern. Kiro and spec-kit ask the human to produce a structured spec artifact (requirements, plan, tasks) which the agent then executes in passes. mk0r asks the agent to produce the spec by interviewing the human, and then writes code informed by the conversation. Spec-kit is great when you have a real product with stakeholders. The interview is great when you have an idea you want to see running in fifteen minutes. The honest pattern is: prototype with the interview, then if the prototype proves the idea, hand it to spec-kit for the production rebuild.

What if I do not want to be interviewed and just want it built?

Write a longer first prompt. The agent's instructions explicitly say "ask if unclear", which means the questions only fire when the prompt is ambiguous. "A pomodoro timer that turns the screen red after 25 minutes, single user, dark theme, minimal" is enough information that the agent will skip the interview and start coding. The interview is a fallback for sparse prompts, not a mandatory gate. If you give the spec, you do not get the questions.

Does the agent actually remember corrections across turns?

Yes. The CLAUDE.md block at src/core/vm-claude-md.ts lines 31 through 45 is explicit about it: "Save a memory immediately when the user corrects you or expresses a preference." Claude Code's memory system writes those to disk in the sandbox. So if you say "I do not want rounded corners on cards" once, the next time you ask for a new screen the agent should not produce rounded corners. This is the part that makes the CTO framing real: a junior engineer who needs the same correction four times is the failure mode this prompt is designed to prevent.

How does this compare to the survey of CTOs who reported AI-code disasters?

The widely cited Final Round AI survey (August 2025) found that 16 of 18 CTOs reported production incidents from AI-generated code. Those incidents were almost all from agents producing too much, too fast, with no spec clarity. The interview pattern does not fix production-grade safety, that still wants real specs and review. What it fixes is the prototype layer, where the question is not "will this scale" but "is this idea worth a follow-up at all". For that question, an interview is sufficient and a spec doc is overkill.

Can I disable the interview if I want pure vibe coding?

There is no official toggle, but you can short-circuit it by writing a maximally specific first prompt or by editing the CLAUDE.md the agent reads inside your sandbox. The point of the design is that vague prompts get a short interview and specific prompts get a fast build. If you genuinely want zero questions even on a one-word prompt, that is a different tool. mk0r is opinionated that one-word prompts deserve a follow-up question before code starts streaming.

Related guides