Guide

Prototype a desktop automation app, operator UI first.

The honest version of building a desktop automation tool with AI: mk0r ships the part a human clicks (workflow builder, run history, schedule, credentials, console) at unconstrained desktop viewport in roughly two minutes. The OS-level engine you write later, in a native runtime, because a browser sandbox cannot reach your laptop's mouse, file system, or installed apps.

Matthew Diakonov, Written with AI

Published May 1, 20269 min read

Direct answer, verified 2026-05-01

Open mk0r.com, describe the operator UI in one sentence (dashboard, workflow builder, run log), and the preview opens at unconstrained desktop viewport. DEVICE_SIZE.desktop is null on line 8 of src/components/phone-preview.tsx, and the AppPreview component defaults deviceMode to "desktop" on line 42. The OS-level engine (mouse hooks, file access, native scripting) is a separate problem you solve later in Tauri, Electron, or a platform script.

The split nobody draws

A desktop automation tool is two products glued together. One is an operator console: dashboard, workflow builder, run history, schedule editor, credentials, action library, console output. The other is an engine: code that opens apps, clicks buttons, reads files, watches events on whatever operating system you target. Most teams blend them and disappear into engine work for a month before showing anyone a UI. By then the operator workflow is wrong, but the engine is half-built around the wrong workflow, and you start over.

The cheap move is to build the operator console as a clickable mock first, watch a real operator try to use it, and only commit to engine work once the workflow shape stops shifting. mk0r is good at this because the prototype is real React running in a real Vite project, not a Figma stack. You can wire mock data to mock state, click around like a real app, and the agent will iterate the surface in plain English while the operator watches over your shoulder.

What goes in the operator console

Six surfaces show up on every desktop automation tool worth shipping. None of them need an engine to render. All of them need to be argued about with real users.

Workflow builder

The thing the operator clicks on. Node graph, list of steps, drag-and-drop sequence, or a YAML editor with live validation. Pick one. mk0r will draft any of them.

Run history

Last N runs with status, duration, and a button to inspect the trace. The data is fake until the engine ships, but the shape is what stakeholders argue about.

Schedule editor

Cron picker or a friendly weekly grid. The first thing real users ask for once they see a workflow they like.

Credentials vault

List of stored secrets, masked values, last-used dates. The UI is the part you can ship today; encryption-at-rest comes later.

Live console

Streaming log lines tailing the current run. Mocked from a JSON array in the prototype, wired to a real socket once the engine speaks.

Action library

The catalog of building blocks: open app, click element, read file, send email, wait. Browseable, searchable, and with placeholder docs the engine team can fill in.

The desktop default, in one line of source

Most AI app makers default to a phone frame because they were built around mobile prototyping. That makes a desktop automation tool look like a toy. mk0r is the inverse. The preview iframe defaults to unconstrained desktop viewport. Here is the file that decides it.

// src/components/phone-preview.tsx (excerpt)
export type DeviceMode = "desktop" | "mobile";

const DEVICE_SIZE: Record<DeviceMode, { w: number; h: number } | null> = {
  desktop: null,                  // <-- unconstrained viewport
  mobile: { w: 390, h: 844 },     // opt-in via DeviceSwitcher
};

export function AppPreview({
  /* ... */
  deviceMode = "desktop",         // <-- desktop is the default
}: AppPreviewProps) { /* ... */ }

A null device size means the iframe is not boxed; it spans the panel. So the workflow builder has room. The run history table has columns that fit. The schedule editor opens at the size a real user will see it. The phone frame is one click away if you want a mobile companion view, but it is not the default. That single null is one of the few places in the file system where a one-character change would turn the product into a different one.

Four steps from sentence to clickable console

1
Describe the surface
One sentence. Name the operator role, the panels, the primary action. Skip the engine entirely.
2
Watch the desktop preview build
Unconstrained viewport, real Vite HMR, the agent verifying its own work in headed Chromium before the iframe repaints.
3
Iterate by talking
Move the credentials tab to the sidebar. Replace the node graph with a YAML editor. Add a kill-switch button. State survives because HMR holds.
4
Hand the URL to a real operator
Shareable preview link. They click around for ten minutes and tell you which panel is missing. That is the validation the engine work hinges on.

Total wall-clock from the first sentence to a stakeholder-clickable URL is usually under ten minutes for the first draft, plus however long the iteration loop runs. The agent verifies its own work in a headed Chromium session inside the sandbox before the preview repaints, so you rarely see a broken intermediate state.

0 minfirst-draft operator UI

0accounts required to start

0pxphone frame, opt-in only

Be specific about what you can and cannot ship

The honest list. The first one is what the prototype delivers. The second one is what stays out of reach until you wrap it in a native runtime.

What ships from the browser sandbox

The operator dashboard, full desktop layout
Workflow builder UI (node graph, list, or YAML)
Run history table with mocked status pills
Schedule editor with cron picker or grid view
Credentials vault list (UI only, no real storage)
Configurable action library catalog
A streaming-style live console (mocked)
A shareable HTTPS preview URL for stakeholder review

What stays out until the native shell

Real mouse and keyboard automation on the user's machine
Reading or writing files on the user's local file system
Driving native apps (Photoshop, Excel, Finder, Outlook)
Operating system APIs (accessibility, notifications, clipboard)
Running PyAutoGUI, AppleScript, AutoHotkey, .NET UIAutomation
Persistent encrypted credential storage (real KMS)
Background daemon that runs when the browser is closed
Code-signing, notarization, or any installer pipeline

The strange recursion: mk0r runs desktop automation to verify your prototype

The sandbox the agent works inside is itself a desktop automation environment. The Dockerfile at docker/e2b/e2b.Dockerfile installs Xvfb (virtual frame buffer for headless X11), x11vnc, websockify, headed Chromium, and @playwright/mcp@0.0.70 as an MCP server. After every file write, the agent can drive that headed browser through Playwright, take a real DOM snapshot, capture console messages, and read it all back. If a required panel is missing or an error renders, it patches the code and runs the check again before your preview repaints.

That is real desktop automation, just at a layer most builders never expose. You are using a tool that does desktop GUI automation against a virtual frame buffer to design a tool that will eventually do desktop GUI automation against your operating system. Different abstractions, same primitive. Useful to know when you start writing the engine, because the mental model carries over.

The handoff: prototype to engine

When the operator UI stops being the unknown, the prototype has done its job. Five steps move you from a sandbox preview to a desktop binary an engine team can ship.

From browser prototype to native binary

🌐

Prototype in mk0r

Operator UI lives in a Vite + React + Tailwind v4 project inside the sandbox VM at /app. Real code, exportable, no proprietary format.

✅

Validate with users

Share the HTTPS preview URL. Watch three operators try to run a fake workflow. Note where they stall.

⚙️

Export the source

Download the Vite project from the VM. It is a normal Node app you can push to GitHub and open in any editor.

🔒

Wrap in a native shell

Tauri (Rust + system webview) or Electron (Node + Chromium) gives you the desktop binary. Both can host the same React code.

🔔

Plug in the engine

Write the OS-level automation in the language that fits the platform. Bridge to the UI through IPC, a local HTTP server, or the framework's command channel.

The React code does not care. Tauri ships a roughly 600 KB binary that wraps a system webview. Electron is heavier (about 80 MB) but ships with a known Chromium. Both let you reuse the prototype code unchanged and bridge to native automation through IPC. Pick on size, license, and platform support, not on whether the prototype survives the move (it does, in either case).

One concrete first prompt

If you want a starting point that consistently produces a useful first draft, paste this verbatim into mk0r and iterate from there:

A desktop control panel for a workflow automation tool.
Left sidebar: searchable list of saved workflows, plus a "New workflow" button.
Main area: node-based workflow builder with trigger / action / condition / output blocks, drag to connect, edit a node by clicking it.
Right sidebar: last 10 runs with status pill (success, failed, running), duration, and timestamp; clicking a run opens a slide-over with the streaming log.
Top bar: workflow name, run-now button, schedule editor (modal with cron picker), and a kill-switch.
Style: clean, generous whitespace, one teal accent, no emojis, no rounded card grid; every panel takes up the full available height.

That prompt is opinionated on purpose. It names the surfaces, names the interaction style, and forbids the generic-AI default of every panel being a uniform rounded card with an emoji. Iterate from there: rip out the node graph and ask for a YAML editor instead, replace the sidebar with a tab bar, swap the modal schedule for an inline editor in the top bar. mk0r will hold the rest of the layout still while it changes the one thing.

Want a hand sketching the operator UI for your automation tool?

Book a 20-minute call. Bring the workflow you have in your head; leave with a clickable prototype URL and a list of the engine work that actually matters.

Frequently asked questions

Can mk0r actually run desktop automation, like clicking buttons in Photoshop or moving files on my Mac?

No, and any tool that says yes from a web app is misleading you. mk0r generates HTML, CSS, JavaScript and a Vite + React project inside an E2B sandbox. The sandbox is a Linux container, not your laptop. It has no access to your mouse, your keyboard outside the browser tab, your file system, or any native app on your machine. What it ships is the part of the product that is the same regardless of language: the operator-facing UI a human uses to design, run, and monitor automations.

If mk0r cannot run the automation, what is the point of prototyping in mk0r?

Most desktop automation tools die before anyone sees the UI, because the team disappears into platform-specific scripting (PyAutoGUI on macOS, AutoHotkey on Windows, AppleScript, .NET UIAutomation) and forgets there is a human who has to operate the thing. The prototype is the cheapest way to validate the workflow shape with real users before committing to weeks of engine work. Build the dashboard, run history, schedule editor, and credentials form in mk0r in under an hour. Show it to three people. If they cannot describe what it does, the engine work is premature.

Why does mk0r render at desktop viewport instead of a phone frame like other AI app makers?

Because the file says so. src/components/phone-preview.tsx line 5 declares DeviceMode as desktop or mobile, line 8 sets DEVICE_SIZE.desktop to null, and line 42 of the AppPreview component defaults deviceMode to desktop. A null device size means the iframe is unconstrained, so the prototype fills the available width like a real desktop app. The mobile flip is opt-in via the DeviceSwitcher. This is the only thing about mk0r that is desktop-first by default; the agent still writes mobile-responsive Tailwind, but the preview shows you the desktop layout first.

What does an honest first prompt look like for a desktop automation tool?

Describe the operator UI, not the engine. Something like: a desktop control panel for a workflow automation tool, left sidebar with a list of saved workflows, main area with a node-based workflow builder showing trigger, action, condition, and output blocks, right sidebar showing the most recent 10 runs with status pill (success, failed, running) and timestamp, top bar with a run-now button and a schedule editor opened in a modal. mk0r will draft that in one streaming pass. Iterate from there with plain English: change the run log to a table, add a credentials vault tab, swap the workflow builder for a YAML editor.

How does mk0r itself relate to desktop automation, beyond hosting the prototype?

Inside the sandbox VM, the agent uses real desktop automation to verify what it just wrote. The Dockerfile installs Xvfb (virtual frame buffer), x11vnc, websockify, and headed Chromium, plus @playwright/mcp@0.0.70 as an MCP server (docker/e2b/e2b.Dockerfile lines 11 to 53). After every file edit, the agent can drive that headed browser through Playwright, take a real DOM snapshot, and read it back. So the prototype you are building rides on a sandbox that is itself running desktop automation against a graphical Chromium session. Different problem, same primitive.

When should I stop using the mk0r prototype and move to native code?

When the operator UI stops being the unknown. The moment your blocker is the engine (does this script find the right window on Windows 11, does the macOS accessibility API expose this menu item, can I read the Chrome bookmarks file without prompting), the prototype has done its job. Export the source from the VM (Vite + React + Tailwind, no proprietary format), wrap it in Tauri or Electron for the native runtime, and start writing the platform-specific bindings. Do not try to push the prototype into doing native work it cannot do.

Do I need to sign up to try this?

No. mk0r generates a session key in localStorage on first visit (crypto.randomUUID, src/app/(landing)/page.tsx line 47) and uses it to claim a sandbox. You can prototype the operator UI for a desktop automation tool, iterate, and get a shareable preview URL without an email address. Signup only matters if you want the project to persist across devices.

Can I export the source and continue in Cursor or VS Code?

Yes. Every file the agent writes lives in a real Vite + React + TypeScript + Tailwind v4 project inside the VM at /app. There is no proprietary format. Download the source, push to your own repo, open it in your editor, and keep going. The prototype is meant to be a starting point, not a lock-in.

Adjacent reads on prototyping inside the same sandbox.

Prototype a desktop automation app, operator UI first.

The split nobody draws

What goes in the operator console

Workflow builder

Run history

Schedule editor

Credentials vault

Live console

Action library

The desktop default, in one line of source

Four steps from sentence to clickable console

Be specific about what you can and cannot ship

The strange recursion: mk0r runs desktop automation to verify your prototype

The handoff: prototype to engine

One concrete first prompt

Want a hand sketching the operator UI for your automation tool?

Frequently asked questions

Frequently asked questions

Read next

AI App Builder Prototyping: What's Inside the Sandbox

Responsive Web App Mobile Prototype: A 390 by 844 Live Preview

Vibe Coding State Limits: One Number, Two Sides