Guide

The Vibe Coding Tool You Can Talk To and Co-Drive

Every other tool in the top results for this keyword is a text box and a wait spinner. mk0r is the one where you speak the idea and then, while the agent is still typing, take the mouse and click around inside its browser. Here is how that is wired and why nothing else mentions it.

Matthew Diakonov, Written with AI

Published April 12, 20266 min

4.8from 10K+ creators

Voice-first input

Co-drive the agent

No signup

What everyone else means by "vibe coding"

Cursor, Lovable, Bolt, v0, Base44, Replit Agent. The pitch is the same across all of them: type a prompt, get code. The interaction model is a chat window. The only senses you use are eyes and fingers. If the agent goes off the rails, you watch it happen and then type a correction after the fact.

That is fine, but it is also exactly what the SERP is full of. If you searched for "vibe coding tool" you have already read four versions of that article. The interesting question is what a vibe coding tool looks like when you stop assuming the user is sitting at a keyboard ready to type.

mk0r treats voice as a first-class input

The home page opens with a microphone, not a text field. You hit record, describe the app you want out loud, and the audio buffer is POSTed straight to /api/transcribe. That route forwards the bytes to Deepgram's REST endpoint withmodel=nova-2,smart_format=true, and punctuate=true. No intermediate transcoding, no waiting for a final stop word. The transcript drops into the prompt and the build kicks off.

The reason this matters for vibe coding specifically: a large chunk of the appeal is the speed of the loop between thought and result. Typing is the slowest step. Removing it changes the rhythm of the session. You think out loud the way you would describe the idea to a friend, and the tool keeps up.

You can verify this for yourself: the route lives atsrc/app/api/transcribe/route.tsin the open source repo, the model name is right at the top, and the input hook is at src/hooks/useVoiceInput.ts.

Talk to it

No account, no setup. Hit the mic on mk0r.com and describe the app you want.

Open mk0r →

The /input WebSocket: take the wheel mid-build

This is the part nothing else does. When you start a VM session, mk0r boots a Freestyle sandbox that runs Vite, Chromium, Playwright MCP, and a tiny HTTP/WS proxy. The proxy lives at/opt/proxy.jsinside the VM and exposes everything on a single public URL.

Two of the WebSocket routes are the interesting ones:

/screencast

Streams CDP frames out of the Chromium instance the Claude agent is controlling, so you see exactly what the agent sees: the same tab, the same scroll position, the same hover state.

/input

Takes mouse and keyboard events from your browser and replays them into the same tab through Chrome DevTools Protocol. While the agent is in the middle of clicking around to test its own build, you can click a different button to show it where you actually want it to look.

You can read the upgrade handler that wires this up atsrc/core/freestyle.ts, around line 348. If your network blocks CDP entirely, the same proxy also exposes /vnc, which is websockify in front of a real VNC server on port 5901. That gives you the full Linux desktop the agent is sitting on, in case you want to open the file manager and check what it just wrote.

Why this changes the vibe of vibe coding

Chat-only AI builders force a turn-taking dynamic: agent works, you read, you type, agent works. Vibe coding sells itself on "describe and build" but in practice you spend a lot of time describing things in words that you could have shown in two seconds with a click.

With voice on the input side and /input on the output side, the session feels less like dictation and more like pair work. You talk, you point, the agent keeps building. If it heads down the wrong path, you click the right element and the agent picks up from there.

The two modes and when to pick which

Quick mode

Claude Haiku streams a single self-contained HTML file with inline CSS and JS into a live preview. Right for calculators, random generators, study cards, single-page utilities. Sub 30-second feedback loop.

VM mode

Boots a Freestyle VM with a real Vite plus React plus TypeScript project, Chromium, and Playwright MCP. The agent can open and test the app it just built. You get the /screencast and /input WebSockets, plus /vnc as a fallback. Right for anything you want to keep iterating on.

Frequently asked questions

What does 'vibe coding' actually mean here?

Vibe coding is the practice Andrej Karpathy named: describing what you want in natural language and letting an LLM produce the code, without micromanaging syntax. mk0r leans into the natural-language part by accepting voice as a first-class input, not just text.

How is mk0r different from Cursor, Lovable, Bolt, v0, or Replit Agent?

Those tools assume you sit at a keyboard and type prompts into a chat box. mk0r accepts voice through Deepgram's nova-2 model and, in VM mode, exposes a /input WebSocket that injects mouse and keyboard events into the same Chromium tab the Claude agent is driving. You can literally grab the cursor and demonstrate what you want while the agent is still working.

Do I need an account or a credit card?

No. Visit mk0r.com, hit the mic, talk, and the build starts. There is no email gate, no trial credit card, no onboarding. The whole point is zero friction from idea to running app.

What handles the speech to text?

The /api/transcribe route forwards your audio to Deepgram's REST endpoint with model=nova-2, smart_format=true, and punctuate=true. The audio buffer is posted directly with whatever content type the browser captured (usually audio/webm), so there is no intermediate transcoding step. You can read the route at src/app/api/transcribe/route.ts.

What does 'co-driving the agent's browser' actually look like?

When you start a VM session, the proxy at /opt/proxy.js exposes two WebSockets next to the Vite app: /screencast streams CDP frames out of Chromium so you see what the agent sees, and /input takes mouse and keyboard events from your browser and replays them into the same tab via Chrome DevTools Protocol. You watch the agent build, and if it heads in the wrong direction you can click a button yourself to nudge it.

What if my network blocks Chrome DevTools Protocol?

The same proxy exposes /vnc, which is websockify in front of a real VNC server on port 5901. You get the full Linux desktop the agent is running on, including the file manager and terminal, in case you want to inspect files directly.

Is there a 'fast path' that skips the VM entirely?

Yes. Quick mode generates a single self-contained HTML/CSS/JS file with Claude Haiku and streams the result into a live preview. Use it for small tools, calculators, and one-page utilities. VM mode boots a full Vite plus React plus TypeScript project with Playwright MCP wired in, which is the right choice for anything you want to keep iterating on.

Do I get real code, or am I locked in?

Real code, both modes. Quick mode hands you a downloadable HTML file. VM mode produces a normal Vite plus React plus TypeScript project tree you can pull out and host anywhere.

Stop typing prompts at a chat box. Talk to the tool and grab the cursor when it matters.

Try mk0r