Guide

AI vibe coding is only honest when the agent has to look at its own page

Most articles describe AI vibe coding as a one-way pipe: type a request, get code, ship it. That description is missing the loop that decides whether the code actually works. On mk0r the agent is forced to open the running app in a real Chromium, take a DOM snapshot, read the browser console, and refuse to call the task done until the page renders. Here is how the loop is wired and where to read it in the source.

Matthew Diakonov, mk0r

Published April 24, 20269 min

4.8from 10K+ creators

Agent-driven Playwright MCP

CDP endpoint at port 9222

Verification required by rule

The vibe coding loop

Generate. Open. Verify. Repeat.

The agent writes the code

It opens the page in real Chromium

Snapshot, console, screenshot

Broken? Edit and try again

Done means the browser agrees

0:00 / 0:06

Why the missing step is the whole point

A language model is a confident output generator. Asked to build a React page, it produces something that looks like a React page. It does not, on its own, know whether the page mounted, whether the import resolved, whether a missing prop crashed the render. The common failure mode of AI vibe coding is exactly this gap: the model says "I built the dashboard" while the actual screen is blank.

mk0r treats that gap as the most important problem in the workflow. The fix is not a smarter model. The fix is a sensor. The sensor is a real browser the agent can drive and observe, sitting on the same machine as the dev server, attached over the Chrome DevTools Protocol so the agent can take a DOM snapshot of the literal page the user is about to see.

The two files that wire the loop

The loop is not magic. It is two pieces of source. One starts a Playwright MCP server pointed at the in-VM Chromium. The other is a written rule the agent reads at boot.

src/core/e2b.ts

That little function is what makes AI vibe coding on mk0r different from just "generation." It hands the agent a real browser on a real port. The CDP endpoint http://127.0.0.1:9222 is the same Chromium that gets screencast back to your tab. You and the agent are looking at the same window from different sides.

src/core/vm-claude-md.ts

Step 5 is the spine of the entire product. "Do not report completion until the browser shows the expected result." The agent has been told, in the system prompt it loads before your first message, that the browser is the source of truth. Code that compiles is not done. Code that the agent thinks works is not done. Code that the browser actually renders is done.

0CDP port the agent attaches to

0Vite dev server port the agent visits

0Steps in the verification rule

0Chromium shared by user and agent

The loop, drawn

The agent, the dev server, the browser, and you all live on the same VM. Code flows one way. Observation flows back. The loop closes when the snapshot matches the request.

The vibe coding feedback loop on one VM

What a single turn actually looks like

The interesting part is not the prompt or the code. It is what happens between "agent writes file" and "agent says done." That is the part most other tools skip. Here is the sequence as it runs inside the VM.

One turn of AI vibe coding on mk0r

What you see in the screencast

Open a session and watch the right pane. The cursor in the screencast is the agent. It moves on its own. You will see it navigate to the dev server, scroll, click, sometimes open DevTools-style overlays. That motion is the agent calling Playwright MCP. Below is a transcript of what the in-VM shell shows during a verification step, paraphrased from the tool-call log.

root@vm:/app (verification phase)

The five-step rule, one line at a time

The Browser Testing section in the rulebook is short. Each line does work. Reading them in order is the simplest way to understand why AI vibe coding on mk0r feels different.

Navigate to http://localhost:5173

The agent opens the same dev server URL the user would type. No staging build, no proxy. The thing it tests is the thing that ships.

Take a snapshot

browser_snapshot returns the rendered accessibility tree. The agent reads element roles, labels, and hierarchy. A blank page is impossible to fake at this layer.

Check browser_console_messages

Runtime errors that never touch the build log show up here. A missing prop, an undefined import, a thrown promise. The agent catches them before you do.

If the page is blank, look at App.tsx imports

The rule names the most common cause of failure on this stack and tells the agent where to look first. Faster recovery, fewer guesses.

Do not report completion until the browser agrees

The hard rule. Without it the loop is optional and the agent will skip it under pressure. With it, completion has a single, observable definition.

The same Chromium, two viewers

The detail that makes this work is that the agent and the user are not looking at parallel browsers. There is one Chromium process inside the VM. Its CDP endpoint is on 127.0.0.1:9222. The agent attaches over CDP. The user attaches via a screencast pulled from the same browser. When the agent clicks a button to verify a flow, you literally see the click happen in your tab.

Why one shared Chromium beats two

If the agent had its own headless browser and you watched a separate preview, divergence would be possible: the headless version passes, your version breaks. By piping the screencast and the agent's Playwright session through the same Chromium, the verification the agent does is on the exact pixels you will see. There is no second copy to drift.

What this changes about the writing experience

With the loop in place, AI vibe coding stops being a guessing game. You can describe vague things and trust that the agent will either deliver something coherent or surface a real failure instead of inventing one. A few practical effects.

What you can stop worrying about

Did it actually render or is it secretly blank (the agent already checked)
Is there a console error I am missing (the agent reads them every turn)
Did it import the new component (step 4 of the rule covers this)
Is the dev server even up (the navigate fails fast if it is not)
Will completion mean what I think (it means the browser shows the result)

How this compares to the common pattern

Most AI vibe coding tools generate code in a chat panel and rely on you to spot bugs. Some compile the output. Few load the running app and read the rendered DOM. Even fewer make verification a written rule the agent must obey.

Feature	Generation-only vibe coding	mk0r vibe coding loop
After code is written	Agent stops, user opens preview	Agent navigates the running app
Verification surface	None, or a syntax check	Real Chromium DOM snapshot via CDP
Runtime errors	Surfaced when the user notices	Read by the agent from browser_console_messages
Browser the agent uses	A separate headless instance, if any	The same one the user is watching
Definition of done	Code generated and saved	Browser shows the expected result
Where the rule lives	Not codified	src/core/vm-claude-md.ts lines 273-283
Recovery from a blank page	User pastes the error back into chat	Rule names App.tsx imports as first thing to check

Reading it yourself

Both files are in the public repo. If you want to verify any of this, the two specific spots are easy to find. The MCP wiring is in src/core/e2b.ts inside buildMcpServersConfig. The written rule is the "Browser Testing" subsection of the globalClaudeMd export in src/core/vm-claude-md.ts. Open them and you have the entire loop in front of you.

That is the part that makes the page you are reading uncopyable. Anyone can write a paragraph about "the AI builds your app." Almost no one can point at the exact 0-step rule and the exact CDP port the agent uses to obey it.

Want to watch the loop run live?

Book 20 minutes and we will open a session, give the agent a vague request, and walk through every snapshot it takes before declaring the page done.

Frequently asked questions

What is AI vibe coding in one sentence?

AI vibe coding is a workflow where you describe an app in natural language and an AI agent writes the code, runs it, and (on mk0r) loads the running app in a real browser to verify it works before telling you it is done.

What is the part most other guides leave out?

The verification step. Most descriptions of AI vibe coding stop at 'the AI generates the code.' On mk0r, the agent is required by its rulebook to navigate to http://localhost:5173 via Playwright MCP, take a DOM snapshot, check the browser console for runtime errors, and only then report the task complete. Generation without verification is the most common failure mode of vibe coding, and the loop is what closes it.

How does the agent see its own running app?

Inside the VM, Chromium runs with --remote-debugging-port=9222. The agent boots with a Playwright MCP server pointed at that exact port: `npx @playwright/mcp --cdp-endpoint http://127.0.0.1:9222`. So when the agent calls browser_navigate, it is driving the same Chromium instance that is being screencast back to your tab. The user and the agent are looking at the same window from different sides.

Where is this loop defined in the source?

Two files. The MCP wiring is in src/core/e2b.ts inside buildMcpServersConfig (lines 153-170). The loop itself is written into src/core/vm-claude-md.ts under the 'Browser Testing' section (lines 273-283). Both are in the appmaker repo. The verification rule is step 5: 'Do not report completion until the browser shows the expected result.'

What happens when the agent finds a problem on its own page?

It reads the snapshot or the console messages and edits the code. The dev server hot-reloads (Vite HMR is locked at clientPort 443, protocol wss, by guardrail). The agent navigates again. Loop until the snapshot matches the request. The user sees this play out in the screencast and only gets a 'done' when the browser actually agrees.

Do I have to know any of this to use it?

No. You type a sentence, you watch the browser update. The loop is the implementation detail that makes 'I described it and it worked' more than a coincidence. You can ignore the entire system and just use the product.

Why does this matter for AI vibe coding specifically?

Because language models are confident liars. An agent will happily say 'I built the page' while shipping a missing import or a runtime crash on mount. Every other layer of the workflow trusts the model. Forcing it to look at its own output in a real browser is the only honest check, and it is the difference between vibe coding that ships a working app on the first try and vibe coding that needs a human to QA every turn.

Does this work for backend code too?

Yes, with a different sensor. For backend behavior the agent runs commands in the VM shell and reads the output. For UI it uses Playwright. The principle is the same: the agent has to observe the real result in the running system before it is allowed to declare success. The browser is one sensor among several, but for vibe coding it is the most important one because UI is where the lying tends to happen.

Stop wondering if it actually rendered. Open a session and watch the agent verify its own work.

Start vibe coding