A desktop automation demo is a playback, not a real automation.
Most pages on this topic point you at Power Automate, UiPath, or AutoHotkey. Those build the real engine. If what you actually need is a three-minute artifact to show a hackathon judge, a customer, or a meetup audience, you do not need an engine. You need a believable playback. Build that as a single-page web app whose UI plays itself, ship it as a URL, and walk away with a frame-by-frame screen recording for free.
Build the demo in mk0r.com as a one-page app whose UI plays a scripted run of cursor moves, line-by-line terminal output, and simulated file changes, all driven by chained setTimeout calls in the React tree. Describe the script in one paragraph; mk0r drafts it in roughly two minutes. Every preview iteration is captured frame-by-frame to /tmp/video/frames/frameNNNNNN.jpg at about 15 fps inside the sandbox (66 ms throttle on line 287 of docker/e2b/files/opt/proxy.js), so the screen recording is a side effect, not a separate step.
The frame the question gets wrong
Type the topic into a search bar and you get RPA tool listicles. Power Automate, UiPath, Leapwork, Ui.Vision, TestGrid. Every one of those is a real production scripting engine for clicking buttons inside native apps on a real operating system. They are the right answer when you are shipping an automation for daily use against a known computer.
They are the wrong answer when the artifact you actually need is a three-minute video, a clickable URL, or a hackathon submission that has to run on a stranger's laptop without setup. A real RPA flow that depends on local Excel installs and accessibility permissions does not survive being shipped to ten different judges. A web-based playback does. The two problems look identical from the outside (the visual is the same: cursor moving, files moving, output appearing) and they are completely different to build.
A demo is a story you can rehearse. An automation is a contract with the operating system. Pick one for the job at hand. This page is about the first one.
One sentence in, one playback out
The shape of a useful first prompt names three things: the surface (what the viewer sees at rest), the trigger (what starts the playback), and the punch line (the visible change at the end). Everything else is iteration.
A demo of an "auto-organize my desktop" tool.
Surface: a Mac-style desktop with about 30 messy icons (random
filenames, jpeg / pdf / docx / mp4 mix), a thin top bar with a
"Scan and tidy" button, and a hidden Trash icon at the bottom-right.
Trigger: the user clicks "Scan and tidy". A cursor sprite glides
from the icon grid up to the button and clicks it.
Playback: a terminal panel slides in from the bottom and streams
about 12 lines of plausible scan output over 3 seconds. As scanning
finishes, files of the same type group themselves into folders
(Images, Docs, Videos), and 3 duplicate jpegs flash red and slide
into the Trash. A counter at the top animates from 0 to "27 files
organized, 3 duplicates removed" with a confetti burst.
Style: warm desktop wallpaper, soft shadows, generous spacing, a
single teal accent for the button and the counter. Mobile-aware
but the desktop layout is the default.That paragraph is opinionated by design. It commits to one specific automation (organizing desktop icons), one specific playback (cursor + terminal + file moves + counter), and one specific punchline (confetti and a number). mk0r writes the React for it in one streaming pass. From there, iteration is conversational: slow the cursor down, replace the confetti with a checkmark, swap the terminal for a progress bar, change the counter to a percentage. The first draft is the slowest part; everything after is talking.
Four beats every playback hits
Every demo of automation, regardless of the underlying problem, follows the same rhythm. If any beat is missing or rushed, the audience does not read it as automation; they read it as a confusing animation.
- 1
Stage
Open on a believable starting state: file browser at rest, terminal idle, status bar saying "ready". The viewer needs a baseline to notice change against.
- 2
Trigger
A cursor sprite glides to the Run button and clicks. Or a presenter clicks it live. Either way, the audience sees the action that starts the work.
- 3
Show work
Terminal output streams line by line at human reading speed. Files glow as they get "touched". Progress bar fills. The screen is busy without being chaotic.
- 4
Reveal result
Files moved, a count animates up, a green checkmark lands. The viewer leaves knowing exactly what the automation did and what changed.
Total wall-clock for the playback should be 6 to 12 seconds. Shorter and the viewer cannot read the work. Longer and you lose them. If the underlying automation in real life would take an hour, the demo still has to finish in 12 seconds; that is the point of a demo.
What the playback actually is, in one timeline
Underneath the visuals, every playback is a chain of setTimeoutcalls posting React state updates against a fake data model. Here is the timeline for a duplicate-file scan, with the viewer's click as the entry event and React state transitions feeding the rendered surface.
One viewer click, scripted state updates, rendered playback
Notice nothing in this sequence touches the operating system, the file system, or any external API. The mock store is a plain JavaScript object held in component state. The terminal lines come from a hard-coded array. The duplicate file paths are made up. The viewer reads the run as automation because the visual choreography is right, not because anything got automated.
Believable inside the iframe, dishonest outside it
The honest list. Anything in the first column is fair game and will read as automation. Anything in the second column the demo cannot do; do not build a script that pretends to.
Things the playback can credibly fake
- Cursor sprite gliding from one element to another
- Terminal output streaming line by line at reading speed
- File rows highlighting, then sliding to a target panel
- Progress bars and counters animating to a final value
- Status pills toggling: idle, running, done, failed
- Modal dialogs popping in mid-run with summary numbers
- Mock file system tree updating in place, with diff badges
- A toggle that lets the viewer rewind and replay the run
Things the playback cannot do, do not pretend
- Real mouse and keyboard events on the viewer's machine
- Reading or writing files on the viewer's local file system
- Driving native apps (Photoshop, Excel, Finder, Outlook)
- Operating system APIs (accessibility, notifications, clipboard)
- Running PyAutoGUI, AppleScript, AutoHotkey, .NET UIAutomation
- Background daemon that runs when the browser tab closes
- Screen recording of an unrelated app on the viewer's desktop
- Anything outside the iframe sandbox the demo runs in
The free screen recording you did not ask for
Most demo authoring tools charge for the screen recording. Loom, Tella, Arcade, Navattic, all of them are essentially capture-and-edit pipelines with hosting on top. mk0r ships the capture pipeline as part of how the live preview already works, because the agent uses it to verify its own writes.
Inside the sandbox VM, a small Node proxy at docker/e2b/files/opt/proxy.js opens a Chrome DevTools Protocol WebSocket against the headed Chromium, sends Page.startScreencast, and listens for Page.screencastFrame events. Each frame arrives as base64-encoded JPEG bytes. The proxy throttles to one frame every 66 milliseconds (line 287), decodes the buffer, and writes it to disk as /tmp/video/frames/frame followed by a six-digit zero-padded index, ending in .jpg. The same buffer also goes out over the WebSocket to the parent window, which is what makes the live preview visible.
Two things follow from this. First, every iteration of your demo is silently captured to disk; the directory grows the entire session. Second, ~15 fps is enough for a desktop UI playback (no fast camera pans, no high-FPS gameplay), so the recording is usable as-is. Stitch the frames into an MP4 with ffmpeg -framerate 15 -i frame%06d.jpg -c:v libx264 -pix_fmt yuv420p out.mp4, or into a GIF with one extra palette pass. You did not have to install OBS, you did not have to fight Loom permissions, and you did not have to pay a per-seat fee.
A demo that responds to the presenter, live
A pure auto-play demo is fine for an embedded landing page. For a stage talk or a sales call you want the demo to wait for a human click, then run. The mk0r preview pipeline already pipes real input events from the parent window into the sandboxed Chromium via CDP.
Around line 401 of the same proxy file, a second WebSocket relay accepts JSON messages from the parent: {type:"mouse",x,y,action,button}, {type:"key",key,code,text,action}, {type:"scroll",x,y,deltaX,deltaY}, {type:"insertText",text}. The relay forwards each one to the headed Chromium as the matching CDP method (Input.dispatchMouseEvent, Input.dispatchKeyEvent, Input.insertText). The browser inside the sandbox cannot tell the difference between these synthetic events and a real user mouse, so the React app inside reacts the same way.
For your demo, this means the Run button works whether the audience is looking at a presenter clicking it on stage or at a recorded URL on Twitter. You build the same React, you wire one onClick to start the timeline, and the input pathway is already there. No additional plumbing.
A useful side effect: the playback is also drivable programmatically. If you wrap the demo in Playwright, you can replay the click trace from a script, capture frames, and produce a deterministic recording. That is how the agent verifies its own output without you watching, and it is the same primitive your eventual real automation engine will use against a real OS.
Eight first prompts that produce useful demos
If you are stuck on what to demo, pick one of these and paste it into the prompt box. Each one is a complete enough surface that the first draft is shareable.
The pattern is consistent: name the surface, name the trigger, name the change. Save the iteration for the second turn. Trying to specify the entire demo in one paragraph produces worse first drafts than starting plain and adding detail by talking.
What the demo plays well alongside
The demo is one artifact in a chain. Here are the tools it tends to share a workflow with, both for shipping the demo and for the day you graduate to real automation.
ffmpeg
Stitch the captured /tmp/video/frames/*.jpg into an MP4 or GIF.
Loom
If you want narration over the playback for asynchronous sharing.
Tella
Talking-head plus screen recording when the demo needs a face.
Arcade
Hotspot tooltips overlaid on a recording, for guided self-serve walkthroughs.
Tauri
When the demo graduates to a real desktop binary, wrap the same React.
Electron
Heavier shell that ships a known Chromium; same React inside, IPC to native.
Playwright
If your eventual automation target is a web app, this is your engine.
AutoHotkey
Windows-side native scripting once the demo wins the room.
mk0r is not trying to replace any of these. It is the front end where the demo is born and the place where the captured frames live. After that, you pick the right shipping format (raw URL, MP4, GIF, hosted walkthrough) and, if the idea wins, the right native runtime (Tauri, Electron) and engine (PyAutoGUI, AppleScript, AutoHotkey, .NET UIAutomation, Playwright).
When to stop demoing and start building
The demo has a clear end-of-life signal. The first time someone asks "what about [specific OS, specific app version, specific edge case]?" with a real budget behind the question, the demo has done its job. Move on. Export the source from the sandbox (Vite + React + Tailwind, no proprietary format), wrap it in a desktop runtime, and start the boring work: probing accessibility APIs, handling locale differences, code-signing the binary, paying for notarization, building an installer.
The React you sketched in the demo carries over verbatim, because it is normal React. The setTimeout-driven timeline gets ripped out and replaced with real event handlers wired to your engine over IPC. The cursor sprite goes away because the real OS cursor takes over. The mock file list becomes a real directory listing. The terminal panel becomes a tail of real subprocess output. The visual hierarchy survives intact, because nothing about it depended on the playback being fake.
Most of the demo is a permanent contribution to the eventual product. The bit you throw away is small. That is what makes it a useful first move and not a tax.
Want a hand staging the playback for your automation idea?
Book 20 minutes. Bring the screen you want to fake; leave with a clickable preview URL and a folder of JPEG frames already on disk.
Frequently asked questions
Frequently asked questions
Will the demo actually click around inside Photoshop or move files on my Mac?
No, and any browser-based tool that promises this is misleading you. The demo runs in a Vite + React project inside a Linux sandbox container; it has no access to your mouse, keyboard outside the browser tab, file system, or installed apps. What it ships is a playback of a fake automation: simulated cursor sprite, mocked file list updating in place, fake terminal output streaming line by line. The viewer reads it as automation because the visuals are right; nothing is actually scripted at the OS level.
Why is a fake demo more useful than a real automation script for showing my idea?
A real script depends on whoever watches it having the same OS, the same app versions, the same file paths, and the same accessibility permissions you do. A demo runs in any browser, on any machine, without setup. For a hackathon judge, a sales call, a portfolio reviewer, or a meetup audience, the demo is the artifact they actually consume. The real script comes after they say yes.
How do I script the playback timing without writing code?
Describe the script in the prompt. Something like: at 0 seconds the cursor is in the top-left, at 1 second it moves to the Scan button and clicks, between 1.5 and 4 seconds the terminal panel streams ten lines of fake output, at 4 seconds three duplicate files appear in the right panel highlighted red, at 5 seconds they slide into a Trash bin in the corner. mk0r turns that into setTimeout-driven state transitions in the React tree, and you iterate by talking. Move the cursor a half-second slower. Add a confetti burst at the end. Make the terminal output green.
Can the demo react to a presenter clicking buttons live, instead of only auto-playing?
Yes. The mk0r preview pipeline pipes real CDP Input.dispatchMouseEvent and Input.dispatchKeyEvent messages from the parent window into the headed Chromium running in the sandbox. So when you click inside the iframe (or a viewer clicks on a shared preview URL), those events hit the running app the same way a real user click would. Build a demo with a Run button, click it during the talk, watch the playback start. The interaction wiring is already there, you just have to add the button.
What do I get out of the sandbox besides a live URL?
A folder of JPEG frames. Every preview iteration is captured frame-by-frame from the headed Chromium via the Chrome DevTools Protocol Page.startScreencast call, throttled to roughly 15 frames per second, and written to /tmp/video/frames/frameNNNNNN.jpg. The frame index is zero-padded to six digits and increments across the whole session. You can stitch them into an MP4 with ffmpeg, post the GIF on social, or scrub through them to find the exact moment to embed in your deck. Most demo tools charge for this; here it is a side effect of how the preview already works.
Where does this stop being enough, and what do I move to next?
When the conversation moves from "does this look believable?" to "does the script work on Windows 11 Build 26100 with Excel 2024 in Russian locale?", the demo has done its job. Export the source from the sandbox; it is a normal Vite + React + Tailwind project, no proprietary format. Wrap the UI in Tauri or Electron for the native runtime, and replace the setTimeout-driven playback with a real engine: PyAutoGUI, AppleScript, AutoHotkey, .NET UIAutomation, Playwright if your target is a web app. The UI you sketched in the demo carries over verbatim because it was real React the whole time.
Do I need an account to share the demo with someone?
No. mk0r generates a session key in localStorage on first visit (crypto.randomUUID) and uses it to claim a sandbox. You can build the demo, get a shareable preview URL, and send the link to anyone. They open it, the demo runs in their browser inside the sandbox you already provisioned. Sign-up only matters if you want the project to persist across devices or share editing access.
Is mk0r open source so I can verify the screencast pipeline myself?
Yes, the repo is github.com/m13v/appmaker. The screencast capture lives at docker/e2b/files/opt/proxy.js around line 280, in a function that opens a CDP connection to the headed Chromium, sends Page.startScreencast, throttles incoming frames to one every 66 milliseconds, and writes them to /tmp/video/frames. The CDP input relay starts around line 401 in the same file and forwards mouse, key, scroll, and insertText messages from the parent into the same browser session. Fork it if you want to tweak the throttle or change the output format.