Guide

AI generated E2E tests for Next.js: put the browser inside the loop

The guides at the top of Google treat AI generated E2E tests as a spec file the model writes after shipping. That is useful, but it is not the strongest version of the idea. The stronger version, for Next.js specifically, is to give the agent live browser access during generation so the test is a gate on every turn, not an artifact at the end.

m
mk0r team
9 min read
4.8from 10K+ creators
Playwright MCP, not spec files
CDP on port 9222, every session
Agent refuses to finish on a blank render

What every top result gets half right

Search for this keyword and you get the same shape of article five times in a row. Install Playwright. Point it at your Next.js dev server. Ask Copilot or Cursor to write specs. Run npx playwright test. Commit the specs. That workflow is real and worth doing, but it treats tests as an artifact. The agent writes a file, a runner consumes the file, and the browser only shows up after everything is already written.

The more useful framing, once you have lived with AI coding for a while, is that the browser itself is the test. The model should be looking at a real render before it says it is done, and its tool responses should include what the page actually shows. If that loop closes inside a single turn, most of the bugs that get blamed on flaky models simply do not ship.

The shape of a loop that gates on the browser

Three moving parts, wired so the agent has the browser as a first class tool, not a post hoc reporter.

Browser as a first class tool

Agent turn
next dev
Chromium
Playwright MCP
DOM snapshot
Console log
Pass / fail

The key arrow is the one going right, from the MCP back to the model. The model does not predict what the page should contain and hope. It reads the snapshot, compares against intent, and either iterates or stops.

The exact command that starts this loop on mk0r

This is the startup sequence every mk0r session runs inside its sandbox. Lifted directly from src/core/vm-scripts.ts. The line you actually care about for this guide is the one that launches Playwright MCP against the running Chromium.

vm-scripts.ts STARTUP_SH

That --cdp-endpoint flag is what makes Playwright MCP attach to an already-running Chromium instead of spawning its own. The browser persists across turns, including cookies and localStorage, which matters once you are testing authenticated flows in Next.js.

What the agent sees

With the MCP wired in, the agent is handed a small set of tools that replace the whole observed test runner abstraction. No spec file, no test runner, no reporter. Just these.

browser_navigate

Agent opens http://localhost:3000 inside the sandbox and waits for Next.js to serve the page. No separate playwright test invocation.

browser_snapshot

Returns the live DOM as structured content. The model reads actual rendered markup, not a spec pass or fail.

browser_console_messages

Runtime errors flow back to the model as readable log lines. Hydration mismatches, thrown promises, fetch failures all become explicit signal.

browser_click / browser_type

The model can interact before it asserts. Click the submit button, type into the form, then snapshot what happened.

No separate test file required

Saying click the primary CTA and confirm the success state renders is enough. The model decides its own assertions from what it sees.

MCP config the agent actually receives

The agent is launched with exactly one browser related MCP server. Not a test framework. Not a runner. Just a handle to a live Chromium. This is the config that lands in its environment:

mcp-servers.json

The prompt that turns the MCP into a gate

Tools alone do not force the behavior. The other half is the system prompt. This is the literal text in the agent's CLAUDE.md, shipped into every sandbox. It is five sentences, and every sentence earns its place.

/root/.claude/CLAUDE.md

The last line is the one most guides miss. The model is told not that it should verify, but that it cannot report completion without verifying. The difference is the difference between a checklist and a barrier.

A turn that actually runs, step by step

One agent turn on a /dashboard route

Agentnext devChromiumMCPedit app/dashboard/page.tsxHMR recompilereadybrowser_navigate /dashboardCDP: Page.navigateload completebrowser_snapshotDOM treebrowser_console_messageshydration warningedit, retrybrowser_snapshot (clean)

The interesting message is the hydration warning coming back from the console. That is a class of bug that a spec file almost never catches, because you rarely write assertions on console output. When the console is a direct tool response, the model reads it and fixes the mismatch without a human ever seeing the error.

Anchor numbers from the repo

0CDP debug port
0MCP server port
0Next.js dev port
0steps before done

Grep 9222 and @playwright/mcp in this repo to see where each of these live. The loop is not a marketing diagram, it is four wired-together services.

Five phases of a single turn

1

Boot the Next.js dev server

next dev runs on port 3000 inside the sandbox. The agent waits for the port to accept connections before moving on.

2

Open the page via Playwright MCP

The agent calls browser_navigate with the dev URL. Chromium loads the route. The first render is either there or it is not.

3

Read the DOM and the console

browser_snapshot returns structured DOM. browser_console_messages returns runtime errors. Both flow back as tool responses.

4

Interact if the scenario calls for it

For forms, clicks, async states, the agent types and clicks through the flow. Snapshot again after each meaningful transition.

5

Refuse to report done if the browser disagrees

The system prompt forbids reporting completion when the snapshot is empty, the console has uncaught errors, or the expected text is missing. The agent iterates on its own code until all three are clean.

Prompt it like this, not like a spec file

You do not ask for Playwright code. You describe a scenario in English and trust the MCP plus the system prompt to do the rest. The prompt below is the entire contract for one turn, including the browser verification step.

turn-prompt.md

The model writes the route, recompiles, opens the URL, clicks the refresh button, reads the new number, checks the console, and only then writes back that the task is done. No test file is ever created unless you ask for one.

Try the loop without wiring it yourself

Every mk0r session boots Chromium with CDP, @playwright/mcp, and the agent already pointed at both. Describe a Next.js page, watch it verify itself.

Open mk0r

How the browser-as-gate pattern compares to the SERP default

FeatureGenerated spec filesBrowser inside the loop
When tests runAfter implementation, in CI or locallyEvery turn, before the agent reports done
What the model seesPass or fail from a spec runnerLive DOM snapshot and console messages
Who writes assertionsModel writes them in advance, hopes they are rightModel observes actual render, decides what to check
Who starts the browserCI pipeline or developerAgent, via CDP on 127.0.0.1:9222
Gate on broken first renderOnly if a spec happens to cover itAlways, snapshot fails if DOM is empty
Where specs end upIn the repo, alwaysOptional, commit what you want regression protected

Both patterns are real and both work. They answer different questions. Specs protect known flows over the life of the repo; the in-loop browser catches the obvious break before you see it.

Where this approach still needs help

  • Regression over time. The in-loop browser catches the break on the turn that caused it. It does not protect a route from being broken three weeks later by an unrelated change. For that, you still need committed Playwright or Vitest specs in the Next.js repo, run in CI.
  • Cross-browser coverage. The sandbox runs one Chromium. It will not tell you that a Safari-only CSS bug exists. For that, you want BrowserStack, Playwright's multi-browser matrix, or an observational tool on your deployed URL.
  • Visual polish. A route can render a number, pass every console check, and still look bad. Snapshot-based visual diffs are a separate workflow; the browser-as-gate pattern is behavioral, not aesthetic.
  • Cost per turn. A turn that navigates, snapshots, interacts, and re-checks spends more tokens and more wall time than a one-shot generation. For ideation and throwaway UI, turn it off. For anything headed toward a ship, leave it on.

A quieter stat that makes the case

The number that matters when you are evaluating this pattern is not how many tests the model writes. It is how many hydration warnings, blank renders, and wrong-prop-named components 0% never reach your branch because the agent saw them first. The console is a brutal reviewer, and it runs on every turn for free once the wiring is in place.

This does not replace your CI. It replaces the, I-sent-you-code- that-does-not-render, class of AI bugs, which until now has been the dominant failure mode of AI coding in Next.js projects.

Frequently asked questions

Will testRigor, QA Wolf, or similar tools work on my Next.js app?

Yes. They crawl the running app and generate test cases from observed behavior. That is a useful pattern once the app already exists. What it cannot do is prevent a broken render from reaching your branch in the first place, because the tool does not see the generation turn. The pattern on this page is complementary: a browser inside the loop that gates each turn, then an observational tool on top of a known good deploy.

Does this pattern work with next dev and app router?

Yes. Playwright MCP does not care which framework you run. It drives Chromium over the Chrome DevTools Protocol. The agent starts next dev on localhost:3000, opens that URL via the MCP browser_navigate tool, interacts with the page, reads the DOM, and checks browser_console_messages for runtime errors. App router, pages router, route handlers, server components, client components, all the same from Chromium's point of view.

Why Playwright MCP specifically instead of just running playwright test?

Both are fine. MCP is the tighter ergonomic for agents because the model calls browser_navigate, browser_click, browser_snapshot as tool calls. It reads the DOM directly as structured content, not as a pass or fail from a spec runner. That lets the model decide what to verify next based on what it just saw, instead of predicting assertions in advance. For finalized regression coverage, running playwright test on spec files is still the right move. For the generation loop, MCP wins.

What does mk0r actually do under the hood?

Every session boots a Chromium instance with remote debugging on port 9222, launches @playwright/mcp with --cdp-endpoint http://127.0.0.1:9222, and exposes the MCP server on port 3001. The agent connects to that MCP server and is instructed by its CLAUDE.md to navigate to the dev server, take a DOM snapshot, and read browser_console_messages before reporting completion. See src/core/vm-scripts.ts around line 1350 in this repo.

Can I replicate this on my laptop without mk0r?

Yes. Install @playwright/mcp, start a browser with --remote-debugging-port=9222, point the MCP server at it, and hand the MCP endpoint to your coding agent (Claude Code, Cursor, etc.). The tedious part is the wiring, not the idea. Boot a dev server, start Chromium with CDP, launch Playwright MCP, make sure your agent has permissions to call the MCP tools, handle cleanup. mk0r ships the wiring; you can copy it.

Is a browser inside the loop slower than a one shot generation?

Yes, measurably. A full turn that includes navigate, snapshot, read console, fix adds seconds, not milliseconds. For a throwaway prototype, skip it. For anything you are going to ship, the speed penalty is smaller than the debug time saved on the first broken render. The tradeoff is well understood by teams that have lived on both sides.

Do I still need real Playwright spec files in my Next.js repo?

Yes, for regression coverage. Spec files are declarative, version controlled, and run in CI. The pattern on this page is about generation, not about long lived test suites. Use both. The agent drives Chromium via MCP during generation to catch the obvious breaks, and you commit spec files for the paths you want protected over the life of the repo.

Describe a Next.js feature. Watch the agent open Chromium, read the DOM, fix its own hydration warnings, and only then hand you the result.

Open mk0r