AI generated E2E tests for Next.js: put the browser inside the loop
The guides at the top of Google treat AI generated E2E tests as a spec file the model writes after shipping. That is useful, but it is not the strongest version of the idea. The stronger version, for Next.js specifically, is to give the agent live browser access during generation so the test is a gate on every turn, not an artifact at the end.
What every top result gets half right
Search for this keyword and you get the same shape of article five times in a row. Install Playwright. Point it at your Next.js dev server. Ask Copilot or Cursor to write specs. Run npx playwright test. Commit the specs. That workflow is real and worth doing, but it treats tests as an artifact. The agent writes a file, a runner consumes the file, and the browser only shows up after everything is already written.
The more useful framing, once you have lived with AI coding for a while, is that the browser itself is the test. The model should be looking at a real render before it says it is done, and its tool responses should include what the page actually shows. If that loop closes inside a single turn, most of the bugs that get blamed on flaky models simply do not ship.
The shape of a loop that gates on the browser
Three moving parts, wired so the agent has the browser as a first class tool, not a post hoc reporter.
Browser as a first class tool
The key arrow is the one going right, from the MCP back to the model. The model does not predict what the page should contain and hope. It reads the snapshot, compares against intent, and either iterates or stops.
The exact command that starts this loop on mk0r
This is the startup sequence every mk0r session runs inside its sandbox. Lifted directly from src/core/vm-scripts.ts. The line you actually care about for this guide is the one that launches Playwright MCP against the running Chromium.
That --cdp-endpoint flag is what makes Playwright MCP attach to an already-running Chromium instead of spawning its own. The browser persists across turns, including cookies and localStorage, which matters once you are testing authenticated flows in Next.js.
What the agent sees
With the MCP wired in, the agent is handed a small set of tools that replace the whole observed test runner abstraction. No spec file, no test runner, no reporter. Just these.
browser_navigate
Agent opens http://localhost:3000 inside the sandbox and waits for Next.js to serve the page. No separate playwright test invocation.
browser_snapshot
Returns the live DOM as structured content. The model reads actual rendered markup, not a spec pass or fail.
browser_console_messages
Runtime errors flow back to the model as readable log lines. Hydration mismatches, thrown promises, fetch failures all become explicit signal.
browser_click / browser_type
The model can interact before it asserts. Click the submit button, type into the form, then snapshot what happened.
No separate test file required
Saying click the primary CTA and confirm the success state renders is enough. The model decides its own assertions from what it sees.
MCP config the agent actually receives
The agent is launched with exactly one browser related MCP server. Not a test framework. Not a runner. Just a handle to a live Chromium. This is the config that lands in its environment:
The prompt that turns the MCP into a gate
Tools alone do not force the behavior. The other half is the system prompt. This is the literal text in the agent's CLAUDE.md, shipped into every sandbox. It is five sentences, and every sentence earns its place.
The last line is the one most guides miss. The model is told not that it should verify, but that it cannot report completion without verifying. The difference is the difference between a checklist and a barrier.
A turn that actually runs, step by step
One agent turn on a /dashboard route
The interesting message is the hydration warning coming back from the console. That is a class of bug that a spec file almost never catches, because you rarely write assertions on console output. When the console is a direct tool response, the model reads it and fixes the mismatch without a human ever seeing the error.
Anchor numbers from the repo
Grep 9222 and @playwright/mcp in this repo to see where each of these live. The loop is not a marketing diagram, it is four wired-together services.
Five phases of a single turn
Boot the Next.js dev server
next dev runs on port 3000 inside the sandbox. The agent waits for the port to accept connections before moving on.
Open the page via Playwright MCP
The agent calls browser_navigate with the dev URL. Chromium loads the route. The first render is either there or it is not.
Read the DOM and the console
browser_snapshot returns structured DOM. browser_console_messages returns runtime errors. Both flow back as tool responses.
Interact if the scenario calls for it
For forms, clicks, async states, the agent types and clicks through the flow. Snapshot again after each meaningful transition.
Refuse to report done if the browser disagrees
The system prompt forbids reporting completion when the snapshot is empty, the console has uncaught errors, or the expected text is missing. The agent iterates on its own code until all three are clean.
Prompt it like this, not like a spec file
You do not ask for Playwright code. You describe a scenario in English and trust the MCP plus the system prompt to do the rest. The prompt below is the entire contract for one turn, including the browser verification step.
The model writes the route, recompiles, opens the URL, clicks the refresh button, reads the new number, checks the console, and only then writes back that the task is done. No test file is ever created unless you ask for one.
Try the loop without wiring it yourself
Every mk0r session boots Chromium with CDP, @playwright/mcp, and the agent already pointed at both. Describe a Next.js page, watch it verify itself.
Open mk0r →How the browser-as-gate pattern compares to the SERP default
| Feature | Generated spec files | Browser inside the loop |
|---|---|---|
| When tests run | After implementation, in CI or locally | Every turn, before the agent reports done |
| What the model sees | Pass or fail from a spec runner | Live DOM snapshot and console messages |
| Who writes assertions | Model writes them in advance, hopes they are right | Model observes actual render, decides what to check |
| Who starts the browser | CI pipeline or developer | Agent, via CDP on 127.0.0.1:9222 |
| Gate on broken first render | Only if a spec happens to cover it | Always, snapshot fails if DOM is empty |
| Where specs end up | In the repo, always | Optional, commit what you want regression protected |
Both patterns are real and both work. They answer different questions. Specs protect known flows over the life of the repo; the in-loop browser catches the obvious break before you see it.
Where this approach still needs help
- Regression over time. The in-loop browser catches the break on the turn that caused it. It does not protect a route from being broken three weeks later by an unrelated change. For that, you still need committed Playwright or Vitest specs in the Next.js repo, run in CI.
- Cross-browser coverage. The sandbox runs one Chromium. It will not tell you that a Safari-only CSS bug exists. For that, you want BrowserStack, Playwright's multi-browser matrix, or an observational tool on your deployed URL.
- Visual polish. A route can render a number, pass every console check, and still look bad. Snapshot-based visual diffs are a separate workflow; the browser-as-gate pattern is behavioral, not aesthetic.
- Cost per turn. A turn that navigates, snapshots, interacts, and re-checks spends more tokens and more wall time than a one-shot generation. For ideation and throwaway UI, turn it off. For anything headed toward a ship, leave it on.
A quieter stat that makes the case
The number that matters when you are evaluating this pattern is not how many tests the model writes. It is how many hydration warnings, blank renders, and wrong-prop-named components 0% never reach your branch because the agent saw them first. The console is a brutal reviewer, and it runs on every turn for free once the wiring is in place.
This does not replace your CI. It replaces the, I-sent-you-code- that-does-not-render, class of AI bugs, which until now has been the dominant failure mode of AI coding in Next.js projects.
Frequently asked questions
Will testRigor, QA Wolf, or similar tools work on my Next.js app?
Yes. They crawl the running app and generate test cases from observed behavior. That is a useful pattern once the app already exists. What it cannot do is prevent a broken render from reaching your branch in the first place, because the tool does not see the generation turn. The pattern on this page is complementary: a browser inside the loop that gates each turn, then an observational tool on top of a known good deploy.
Does this pattern work with next dev and app router?
Yes. Playwright MCP does not care which framework you run. It drives Chromium over the Chrome DevTools Protocol. The agent starts next dev on localhost:3000, opens that URL via the MCP browser_navigate tool, interacts with the page, reads the DOM, and checks browser_console_messages for runtime errors. App router, pages router, route handlers, server components, client components, all the same from Chromium's point of view.
Why Playwright MCP specifically instead of just running playwright test?
Both are fine. MCP is the tighter ergonomic for agents because the model calls browser_navigate, browser_click, browser_snapshot as tool calls. It reads the DOM directly as structured content, not as a pass or fail from a spec runner. That lets the model decide what to verify next based on what it just saw, instead of predicting assertions in advance. For finalized regression coverage, running playwright test on spec files is still the right move. For the generation loop, MCP wins.
What does mk0r actually do under the hood?
Every session boots a Chromium instance with remote debugging on port 9222, launches @playwright/mcp with --cdp-endpoint http://127.0.0.1:9222, and exposes the MCP server on port 3001. The agent connects to that MCP server and is instructed by its CLAUDE.md to navigate to the dev server, take a DOM snapshot, and read browser_console_messages before reporting completion. See src/core/vm-scripts.ts around line 1350 in this repo.
Can I replicate this on my laptop without mk0r?
Yes. Install @playwright/mcp, start a browser with --remote-debugging-port=9222, point the MCP server at it, and hand the MCP endpoint to your coding agent (Claude Code, Cursor, etc.). The tedious part is the wiring, not the idea. Boot a dev server, start Chromium with CDP, launch Playwright MCP, make sure your agent has permissions to call the MCP tools, handle cleanup. mk0r ships the wiring; you can copy it.
Is a browser inside the loop slower than a one shot generation?
Yes, measurably. A full turn that includes navigate, snapshot, read console, fix adds seconds, not milliseconds. For a throwaway prototype, skip it. For anything you are going to ship, the speed penalty is smaller than the debug time saved on the first broken render. The tradeoff is well understood by teams that have lived on both sides.
Do I still need real Playwright spec files in my Next.js repo?
Yes, for regression coverage. Spec files are declarative, version controlled, and run in CI. The pattern on this page is about generation, not about long lived test suites. Use both. The agent drives Chromium via MCP during generation to catch the obvious breaks, and you commit spec files for the paths you want protected over the life of the repo.
Describe a Next.js feature. Watch the agent open Chromium, read the DOM, fix its own hydration warnings, and only then hand you the result.
Open mk0r