Guide

A desktop automation demo is a playback, not a real automation.

Most pages on this topic point you at Power Automate, UiPath, or AutoHotkey. Those build the real engine. If what you actually need is a three-minute artifact to show a hackathon judge, a customer, or a meetup audience, you do not need an engine. You need a believable playback. Build that as a single-page web app whose UI plays itself, ship it as a URL, and walk away with a frame-by-frame screen recording for free.

Matthew Diakonov, Written with AI

Published May 6, 202610 min read

Direct answer, verified 2026-05-06

Build the demo in mk0r.com as a one-page app whose UI plays a scripted run of cursor moves, line-by-line terminal output, and simulated file changes, all driven by chained setTimeout calls in the React tree. Describe the script in one paragraph; mk0r drafts it in roughly two minutes. Every preview iteration is captured frame-by-frame to /tmp/video/frames/frameNNNNNN.jpg at about 15 fps inside the sandbox (66 ms throttle on line 287 of docker/e2b/files/opt/proxy.js), so the screen recording is a side effect, not a separate step.

The frame the question gets wrong

Type the topic into a search bar and you get RPA tool listicles. Power Automate, UiPath, Leapwork, Ui.Vision, TestGrid. Every one of those is a real production scripting engine for clicking buttons inside native apps on a real operating system. They are the right answer when you are shipping an automation for daily use against a known computer.

They are the wrong answer when the artifact you actually need is a three-minute video, a clickable URL, or a hackathon submission that has to run on a stranger's laptop without setup. A real RPA flow that depends on local Excel installs and accessibility permissions does not survive being shipped to ten different judges. A web-based playback does. The two problems look identical from the outside (the visual is the same: cursor moving, files moving, output appearing) and they are completely different to build.

A demo is a story you can rehearse. An automation is a contract with the operating system. Pick one for the job at hand. This page is about the first one.

One sentence in, one playback out

The shape of a useful first prompt names three things: the surface (what the viewer sees at rest), the trigger (what starts the playback), and the punch line (the visible change at the end). Everything else is iteration.

A demo of an "auto-organize my desktop" tool.
Surface: a Mac-style desktop with about 30 messy icons (random
filenames, jpeg / pdf / docx / mp4 mix), a thin top bar with a
"Scan and tidy" button, and a hidden Trash icon at the bottom-right.
Trigger: the user clicks "Scan and tidy". A cursor sprite glides
from the icon grid up to the button and clicks it.
Playback: a terminal panel slides in from the bottom and streams
about 12 lines of plausible scan output over 3 seconds. As scanning
finishes, files of the same type group themselves into folders
(Images, Docs, Videos), and 3 duplicate jpegs flash red and slide
into the Trash. A counter at the top animates from 0 to "27 files
organized, 3 duplicates removed" with a confetti burst.
Style: warm desktop wallpaper, soft shadows, generous spacing, a
single teal accent for the button and the counter. Mobile-aware
but the desktop layout is the default.

That paragraph is opinionated by design. It commits to one specific automation (organizing desktop icons), one specific playback (cursor + terminal + file moves + counter), and one specific punchline (confetti and a number). mk0r writes the React for it in one streaming pass. From there, iteration is conversational: slow the cursor down, replace the confetti with a checkmark, swap the terminal for a progress bar, change the counter to a percentage. The first draft is the slowest part; everything after is talking.

Four beats every playback hits

Every demo of automation, regardless of the underlying problem, follows the same rhythm. If any beat is missing or rushed, the audience does not read it as automation; they read it as a confusing animation.

1
Stage
Open on a believable starting state: file browser at rest, terminal idle, status bar saying "ready". The viewer needs a baseline to notice change against.
2
Trigger
A cursor sprite glides to the Run button and clicks. Or a presenter clicks it live. Either way, the audience sees the action that starts the work.
3
Show work
Terminal output streams line by line at human reading speed. Files glow as they get "touched". Progress bar fills. The screen is busy without being chaotic.
4
Reveal result
Files moved, a count animates up, a green checkmark lands. The viewer leaves knowing exactly what the automation did and what changed.

Total wall-clock for the playback should be 6 to 12 seconds. Shorter and the viewer cannot read the work. Longer and you lose them. If the underlying automation in real life would take an hour, the demo still has to finish in 12 seconds; that is the point of a demo.

What the playback actually is, in one timeline

Underneath the visuals, every playback is a chain of setTimeoutcalls posting React state updates against a fake data model. Here is the timeline for a duplicate-file scan, with the viewer's click as the entry event and React state transitions feeding the rendered surface.

One viewer click, scripted state updates, rendered playback

Notice nothing in this sequence touches the operating system, the file system, or any external API. The mock store is a plain JavaScript object held in component state. The terminal lines come from a hard-coded array. The duplicate file paths are made up. The viewer reads the run as automation because the visual choreography is right, not because anything got automated.

Believable inside the iframe, dishonest outside it

The honest list. Anything in the first column is fair game and will read as automation. Anything in the second column the demo cannot do; do not build a script that pretends to.

Things the playback can credibly fake

Cursor sprite gliding from one element to another
Terminal output streaming line by line at reading speed
File rows highlighting, then sliding to a target panel
Progress bars and counters animating to a final value
Status pills toggling: idle, running, done, failed
Modal dialogs popping in mid-run with summary numbers
Mock file system tree updating in place, with diff badges
A toggle that lets the viewer rewind and replay the run

Things the playback cannot do, do not pretend

Real mouse and keyboard events on the viewer's machine
Reading or writing files on the viewer's local file system
Driving native apps (Photoshop, Excel, Finder, Outlook)
Operating system APIs (accessibility, notifications, clipboard)
Running PyAutoGUI, AppleScript, AutoHotkey, .NET UIAutomation
Background daemon that runs when the browser tab closes
Screen recording of an unrelated app on the viewer's desktop
Anything outside the iframe sandbox the demo runs in

The free screen recording you did not ask for

Most demo authoring tools charge for the screen recording. Loom, Tella, Arcade, Navattic, all of them are essentially capture-and-edit pipelines with hosting on top. mk0r ships the capture pipeline as part of how the live preview already works, because the agent uses it to verify its own writes.

Inside the sandbox VM, a small Node proxy at docker/e2b/files/opt/proxy.js opens a Chrome DevTools Protocol WebSocket against the headed Chromium, sends Page.startScreencast, and listens for Page.screencastFrame events. Each frame arrives as base64-encoded JPEG bytes. The proxy throttles to one frame every 66 milliseconds (line 287), decodes the buffer, and writes it to disk as /tmp/video/frames/frame followed by a six-digit zero-padded index, ending in .jpg. The same buffer also goes out over the WebSocket to the parent window, which is what makes the live preview visible.

Two things follow from this. First, every iteration of your demo is silently captured to disk; the directory grows the entire session. Second, ~15 fps is enough for a desktop UI playback (no fast camera pans, no high-FPS gameplay), so the recording is usable as-is. Stitch the frames into an MP4 with ffmpeg -framerate 15 -i frame%06d.jpg -c:v libx264 -pix_fmt yuv420p out.mp4, or into a GIF with one extra palette pass. You did not have to install OBS, you did not have to fight Loom permissions, and you did not have to pay a per-seat fee.

A demo that responds to the presenter, live

A pure auto-play demo is fine for an embedded landing page. For a stage talk or a sales call you want the demo to wait for a human click, then run. The mk0r preview pipeline already pipes real input events from the parent window into the sandboxed Chromium via CDP.

Around line 401 of the same proxy file, a second WebSocket relay accepts JSON messages from the parent: {type:"mouse",x,y,action,button}, {type:"key",key,code,text,action}, {type:"scroll",x,y,deltaX,deltaY}, {type:"insertText",text}. The relay forwards each one to the headed Chromium as the matching CDP method (Input.dispatchMouseEvent, Input.dispatchKeyEvent, Input.insertText). The browser inside the sandbox cannot tell the difference between these synthetic events and a real user mouse, so the React app inside reacts the same way.

For your demo, this means the Run button works whether the audience is looking at a presenter clicking it on stage or at a recorded URL on Twitter. You build the same React, you wire one onClick to start the timeline, and the input pathway is already there. No additional plumbing.

A useful side effect: the playback is also drivable programmatically. If you wrap the demo in Playwright, you can replay the click trace from a script, capture frames, and produce a deterministic recording. That is how the agent verifies its own output without you watching, and it is the same primitive your eventual real automation engine will use against a real OS.

Eight first prompts that produce useful demos

If you are stuck on what to demo, pick one of these and paste it into the prompt box. Each one is a complete enough surface that the first draft is shareable.

An auto-organize-my-desktop demo: a Mac-like desktop with messy icons, a Run button at the top

A duplicate-file finder: file browser on the left, scan button, fake duplicates highlighted

An invoice-OCR demo: drop zone on the left, extracted JSON streaming on the right

A meeting-notes summarizer: transcript in a panel, summary fading in as the cursor scrubs

A screenshot-to-Notion demo: image preview, fake API call, Notion-style page rendering

A clipboard-history watcher: list growing as fake clipboard events fire every few seconds

A spreadsheet auto-formatter: ugly CSV in, formatted table out, with progress in between

A bookmark deduper: a tree of bookmarks, scan, duplicates collapsing in place