Guide

Reading AI code before merging: the unit of merge is the bug

The yes-you-should and no-trust-the-tests crowds are arguing past each other because they both assume one fixed chunk size: a PR bundling 8 to 15 prompts worth of changes. Nobody reads that. The honest answer is structural. One prompt should produce one diff, and the read should fit in the 30 seconds between you and the next prompt.

M
Matthew Diakonov
6 min read

Direct answer (verified )

Yes, read AI code before merging.But the readable unit is one prompt's diff, not a PR. Most setups bundle 5 to 15 prompts into one merge unit and call it unreviewable. Shrink the unit (one prompt, one commit, one optional undo) and the read takes 30 seconds. The source you can verify this against is github.com/m13v/appmaker, specifically src/core/e2b.ts line 1772, where git diff --cached --quiet short-circuits empty turns so every entry in git log corresponds to one prompt and one real diff.

The argument is about the wrong thing

Walk through the standard versions of this debate. One side says you have to read every line, AI hallucinates subtle bugs, tests cannot catch logic errors that look right. The other side says we trust compiler output, we trust linters, the cost of reading every diff line by line is the whole productivity gain. Both are correct given the chunk size they assume. Both miss the chunk size itself.

Human attention scales with the size of the thing. A 12-line diff with a clear commit message reads in ten seconds. A 1,400-line diff that bundles eight prompts into one merge is functionally unreadable. The reviewer skims and approves, or rage-rejects; neither is a real review. The chunk size is doing the work, not the discipline.

Once you accept that, the right question is not should you read, it is can you arrange your loop so the diff that lands in front of you is one prompt big. If yes, the read costs nothing and you do it. If no, no amount of discipline saves you.

One prompt is one commit, in source

Here is the path a single turn takes through mk0r. Watch where the commit lands. It is not at the end of a session, it is at the end of every successful turn.

What happens on one prompt in mk0r

YouChat APIAgent (VM)git (in VM)POST /api/chat { prompt }stream prompt to in-VM agenttokens stream, preview iframe updates live(optional) POST /api/chat/cancelgit add -Agit diff --cached --quiet? if yes echo NOCHANGEgit commit -q -m '<first 120 chars of prompt>'new SHAcommitTurn returns sha; historyStack appendsturn done, undo button now points at prior SHA

The commit message comes from src/app/api/chat/route.ts line 1008: prompt.split("\n")[0].trim().slice(0, 120). The 120-char slice is deliberate; commit messages stay skimmable in git log --oneline and your prompt history doubles as your review TOC.

What the review surface looks like

Inside the VM after seven prompts of building a todo app, the log looks like this. The leftmost column is a SHA, the rest is the first line of the prompt that produced the diff. Pick any SHA and git show it to read exactly what that prompt changed.

Inside the VM at /app

That is the pre-merge review surface. It already exists. Nobody had to set up a CI pipeline, write commit conventions, or train a team. The agent does it on every turn. The reader picks the turn they care about and reads one diff at a time.

120

First line of the prompt, trimmed to 120 chars, is the commit message. git log reads like the chat panel because that is literally how it is built.

src/app/api/chat/route.ts line 1008, then commitTurn at src/core/e2b.ts line 1759

The cancel button is your pre-merge gate

The cheapest read happens while the diff is streaming. The preview iframe updates as the agent writes, so you usually know within a few seconds whether the turn is going the right way. If the answer is no, cancel. POST /api/chat/cancel posts to the in-VM ACP cancel endpoint. The agent stops. commitTurn short-circuits because git diff --cached --quiet passes (nothing was staged). No commit lands. You stay on the same SHA you had before the turn started.

Compare that to the usual flow, where the agent finishes, commits, and the diff lands in a PR queue. Now you have to read the diff and decide to revert or approve, and reverting carries the weight of an explicit rollback. The structural fix is to move the gate earlier: read the streaming diff, kill the turn if it looks wrong, and the commit never happens.

Undo handles the other case. If the turn finished, the commit landed, and only then did you notice it was wrong, undoTurn at line 1855 of e2b.ts walks activeIndex back through the history stack and checks out the prior SHA into a fresh empty commit. One SHA undone, the rest of the session untouched.

The honest counterargument

There is one case where reading every prompt-sized diff is the wrong move: throwaway prototypes where the failure mode is "demo does not work" and the blast radius is your own laptop. For a weekend build, the live preview is the read. Trusting the agent and iterating fast beats stopping to read every diff, because nothing here will outlive Sunday afternoon.

The pre-merge read earns its keep when the code touches auth, payments, or user data; when the diff will become a load-bearing piece of something that ships; or when you are stuck and need to understand what the agent actually did before issuing the next prompt. For everything else, the preview iframe is enough signal. mk0r leans into this with the live preview that updates as the agent writes; the visual is the dominant channel, the source code is there for when you need it.

So the resolution is not "always read," it is "arrange the loop so the read is cheap when you need it and skippable when you do not." The chunk size is what makes that choice possible. A multi-prompt PR forecloses both options; one prompt, one commit keeps them open.

How to recreate this in Cursor or Claude Code

You do not need mk0r to apply the pattern; you need to remove the friction that keeps you from committing after every accepted suggestion. A shell alias, a pre-tool hook, or a tiny wrapper script all work. The pattern is:

  1. After every accepted AI change, run git add -A && git commit -m "<first line of prompt>". Make it a one-key alias.
  2. When the next suggestion arrives, look at the streaming diff before accepting. If it looks wrong, reject before it touches your tree.
  3. When a committed turn turns out bad, undo with git reset --hard HEAD~1 before issuing the next prompt. Do not let three more prompts land on top of the bad one.
  4. Treat git log --oneline as your session memory. The prompt history is right there, with diffs attached.

None of this is novel git usage; the trick is doing it on every turn instead of every fifth turn. The discipline is much smaller than "read a 1,400-line PR carefully," which is the discipline the alternative asks for.

Frequently asked questions

Should I read AI-generated code before merging it?

Yes, but be honest about what 'merging' means in your setup. If your AI coding loop produces a single PR that bundles 8 to 15 prompts worth of changes, you are not going to read it. Nobody reads a 1,400-line AI-generated diff line by line, and the tests passing tells you the obvious bugs are gone, not the subtle ones. The actionable answer is structural: shrink the merge unit so the read fits in 30 seconds. The pattern that works is one prompt produces one commit, one diff, and you decide to keep that single diff or undo it before issuing the next prompt. mk0r is built around this shape (commitTurn at src/core/e2b.ts line 1759, runs after every successful turn), but you can recreate it in Cursor or Claude Code by committing after every accepted change and using git reset HARD to revert turns you do not like.

What does mk0r commit on a turn, literally?

The handler in src/app/api/chat/route.ts line 1008 reads `const msg = prompt.split("\n")[0].trim().slice(0, 120) || "Agent turn"`. Then it calls commitTurn(sessionKey, msg). commitTurn runs a six-step script inside the VM: cd /app, git add -A, git diff --cached --quiet (short-circuit to NOCHANGE if nothing staged), git commit -q -m '<msg>', git rev-parse HEAD. The whole tree is committed every turn. The commit message is the first line of your prompt. So `git log --oneline` reads like the chat panel, in chronological order, with real diffs attached.

What if the agent ran but did not write any files?

Line 1772 of src/core/e2b.ts handles this: `if git diff --cached --quiet; then echo NOCHANGE; exit 0; fi`. The script echoes NOCHANGE and exits zero. commitTurn returns null. No commit lands. The history stack does not grow. So chatty turns where the agent only fetched something or thought out loud do not pollute the log. Every entry in `git log` corresponds to an actual change, which is exactly what you want a review surface to be.

Can I cancel a turn before it commits?

Yes. /api/chat/cancel posts to the in-VM ACP cancel endpoint. The agent stops. commitTurn short-circuits because git diff --cached --quiet passes (nothing was staged). No commit lands. You stay on the same active SHA. This is the cheapest pre-merge gate there is: if the diff streaming into the preview looks wrong, cancel before the turn lands. Zero commits, zero pollution, full reset of the current prompt. The undo button is for the case where the turn landed and you only realized after.

Why is the chunk size the actual problem, not whether you read?

Because human attention scales with the size of the thing. A 12-line diff with a clear commit message reads in 10 seconds. A 1,400-line diff bundling 8 prompts is functionally unreadable; the reviewer either skims and approves or rage-rejects, neither of which is a real review. Every existing piece on this topic argues yes-read or no-trust-the-tests assuming a fixed chunk size. Shrinking the chunk turns the argument into a non-question. You can read a 30-second diff between prompts without it slowing the loop, and you cannot read a multi-prompt PR without breaking the loop entirely.

How do I undo a single turn if I read it and do not like what I see?

Click the undo button in the chat panel, or call POST /api/chat/undo. Internally, undoTurn at src/core/e2b.ts line 1855 walks activeIndex back by one in historyStack, calls revertToSha(sessionKey, prevSha, 'Undo to <sha>'), which runs `git checkout <sha> -- .` followed by `git add -A` and `git commit --allow-empty -m '...'`. You end up on a fresh commit whose tree matches the prior turn. Redo is the symmetric operation; activeIndex moves forward. The history stack never loses entries, so you can flip back and forth between turns to compare without re-prompting.

Does the per-prompt commit pattern work outside mk0r?

Yes, if you wire it yourself. In Claude Code or Cursor, after every accepted change run `git add -A && git commit -m "<first line of your prompt>"` (a shell alias makes this one keystroke). To undo a single turn use `git reset --hard HEAD~1`. The unit becomes the same: one prompt, one diff, one optional undo. The reason it usually does not happen is friction: nobody types the commit command after every accepted suggestion, so 6 prompts smear into one diff, and then you are back to the unreadable chunk. The mk0r approach is to remove the friction by committing automatically server-side; you can recreate it locally with a 5-line pre-tool hook or a tiny script.

What about reviewing AI code after merging, in production?

The same per-turn log carries forward. There is a deeper write-up on the post-deployment review surface at /t/vibe-coding-production-review-tail. The short version: `git log --oneline` is the prompt history, `git show <sha>` is one prompt's diff, and the unit of review is the same whether you are reading before merge or auditing after. The structural fix to chunk size pays off twice.

Is reading AI code before merging worth the time on throwaway prototypes?

Often no. For a weekend prototype where the failure mode is 'demo does not work' and the blast radius is your own laptop, treat the AI like a fast first draft and trust the preview. The pre-merge read earns its keep when (a) the code touches auth, payments, or user data, (b) the diff will become a load-bearing piece of something that ships, or (c) you are stuck and need to understand what the agent actually did before issuing the next prompt. For everything else, the preview is the read. mk0r leans into this with a live mobile preview iframe that updates as the agent writes, so the visual is the dominant signal and the source is there for when you need it.

Want to see the per-turn review surface in your stack?

15-minute call. I will walk you through how mk0r commits per turn and what the equivalent looks like in Cursor, Claude Code, or your own agent setup.

mk0r.AI app builder
© 2026 mk0r. All rights reserved.