The HTML Forms Generator That Reads Your Picture
Drop a Figma export, a phone photo of a paper form, or a screenshot of a competitor’s signup page. mk0r turns the pixels into markup in the same turn, no dragging fields, no visual builder to learn, no account.
Drop an Image of a FormThe part every other guide about this topic skips
Look up any listicle of form generators online. The ones that rank will walk you through a drag-and-drop canvas, a field palette, a style panel, a publish button. Jotform. BeautifyTools. Paperform. Basin. FormBuilder.js. They all have the same shape: you build the form by hand, field by field, inside their UI.
Nobody in that list accepts a picture. Nobody lets you upload the form you already have, the one sitting in a Figma file, the one printed on a piece of paper on your desk, the one your competitor shipped last quarter. You still have to look at the design, then click through a menu fifty times to reproduce it.
That is the gap. A generator that can see is a generator you can hand a picture to and come back to a working form.
What the attachments pipeline actually does
When you drop a PNG into the chat box on mk0r.com, the file takes two parallel paths in the same HTTP request. One copy lands inside the sandbox filesystem; a second copy rides along as an inline vision block on the prompt. Both happen before Claude writes a single tag.
One image, two destinations, one turn
The anchor fact: 20 MB and a vision block
If you read one file in the repo to verify this page, read src/app/api/chat/route.ts lines 266 to 341. The constant MAX_INLINE_IMAGE sits at 20 MB, the decode size is estimated with Math.ceil((att.data.length * 3) / 4), and any image/* attachment under that ceiling is pushed onto the prompt blocks as a real image payload. This is the concrete bit nobody else has.
Two things to notice. The picture and the text travel in the same prompt, not two separate turns. And the picture is saved to disk too, so the agent can use any of its tools against the file afterwards, not just its eyes.
What you can drop in and what comes back
The attachment handler is mime-type driven. Each bucket is routed a little differently. For form work, the first row is the one that matters: image/* under 20 MB goes straight into the vision-enabled turn.
image/* under 20 MB
Inlined as a vision block. PNG, JPG, WebP, HEIC, GIF all qualify. The agent sees the picture and writes matching markup.
text/*, JSON, XML under 10 MB
Decoded from base64 and inlined as raw text. Useful for dropping in an existing HTML file or a form spec in JSON.
application/pdf
Saved to /app/uploads and the agent is told to open it with Read. Good for scanned paper forms.
Any other binary
Saved to /app/uploads with a path reference. No inlining. The agent can still run tools against it.
Over the size ceiling
File lands in /app/uploads, agent is told it is too large to inline, Read is recommended. Nothing is lost; only the inline shortcut is skipped.
The old way vs. this way
Recreating an existing form in a drag-and-drop builder is a task measured in clicks. The image-to-markup path is one drop plus one send.
Recreating a 14-field application form from a design
Open the builder. Create a new form. Drag a text input. Set its label. Set its name. Set its validation. Drag the next input. Repeat fourteen times. Pick a theme. Tweak spacing. Export HTML. Paste into your site. Discover a label is wrong. Go back. Fix.
- 14 manual drag-drops for 14 fields
- Another pass for validation and required flags
- Style panel to approximate the design
- Round-trip between builder and your site to test
Six things to drop in that people actually have
These are the inputs that come up in real sessions. Each one is a picture you already own; none of them fit into a visual builder.
Numbers from the route file
Every number below comes directly from reading src/app/api/chat/route.ts. None are marketing estimates.
The 20 MB ceiling is MAX_INLINE_IMAGE on line 269. The 10 MB is MAX_INLINE_TEXT on line 270. The four routes are the four branches of the mime-type switch inside the for-loop. The one-turn behavior is the order of pushes onto promptBlocks: attachments first, user text last.
What happens from click to rendered form
This is the concrete path a single image takes through the system. Each step is something you could watch happen in DevTools Network tab or in the live screencast pane.
You drag the image onto the chat box
The client reads the file, base64-encodes it, and attaches it to the next outgoing request as { name, mimeType, data }.
POST /api/chat with the attachment
The server route extracts attachments from the JSON body, then walks the array once. Each item gets a destPath of /app/uploads/<name>.
writeFileToVm streams the bytes into the sandbox
Inside src/core/e2b.ts, writeFileToVm hits the ACP /write-file endpoint on the running VM. The file lands in the filesystem where the agent lives.
Vision block pushed onto promptBlocks
For image/* under 20 MB, a { type: 'image', data, mimeType } block is appended. A small [Uploaded image: <name>] marker text block follows so the agent knows the file is also on disk.
User's written prompt pushed last
promptBlocks.push({ type: 'text', text: prompt }). Claude receives all blocks together via /session/prompt, so the picture and the instructions arrive on the same turn.
Agent writes the form
Claude reads the image, inspects the saved file if needed, writes the HTML or the React component, opens the VM's Chromium at localhost:5173 via Playwright to confirm it renders. Every turn is also committed to the session's local Git repo.
Bring a form picture; we will build it with you
Book 15 minutes. Share your screen, drop the image into the chat, and we will walk through the generated markup together.
Book a call →How mk0r compares to the visual builders
A direct comparison against the generators that dominate this category. The differences are structural, not cosmetic.
| Feature | Drag-and-drop builders | mk0r |
|---|---|---|
| Accepts an image of an existing form | No | Yes, up to 20 MB per image |
| Accepts a PDF scan | No | Yes, saved to /app/uploads for Read |
| Text plus image on the same turn | n/a | Yes, image blocks pushed before text |
| Output is real source code | HTML snippet or embedded widget | Plain HTML or typed React component |
| Requires signup | Yes | No |
| Iterate with plain English | No, drag fields again | Yes, follow-up prompts edit the markup |
| File also lives on disk in the sandbox | No | Yes, /app/uploads/<name> |
Quick mode vs. VM mode for image-to-form
Both modes accept the same attachments payload. The difference is what the agent does with it afterwards.
Quick mode
Streams a single self-contained HTML file with inline CSS and a tiny JS handler. No sandbox, no Git, no services. Uses Claude Haiku for latency.
- Under 30 seconds to first render
- Paste the output into any existing site
- Best for a static form you will wire yourself
VM mode
Boots an E2B sandbox with Vite, React, TypeScript, and Playwright. The agent writes a typed component, renders it at localhost:5173, confirms it looks right.
- Typed source you can continue to iterate on
- Per-turn Git history, undo and redo available
- Best when the form has real behavior
One last thing the other pages forget
The image you drop in is not thrown away after the first turn. Because writeFileToVm saves it to /app/uploads/<name>, every follow-up prompt can reference it again. Ask the agent “look back at the design and add the missing checkbox from the third row” and it can open the file with its Read tool, see the checkbox, and add it.
copies of your image, one on disk, one in the prompt
fields you have to drag onto a canvas
turn that mixes design input with written instructions
Frequently asked questions
What kinds of images can I drop into mk0r to get an HTML form?
Anything up to 20 MB in an image/* mime type. In practice that covers Figma exports (PNG/JPG), screenshots of a competitor's form, phone photos of a paper or printed form, a cropped section of a PDF, sketches on a whiteboard, or screenshots from a design system. The limit lives in src/app/api/chat/route.ts as MAX_INLINE_IMAGE = 20 * 1024 * 1024.
How does the image reach the model?
Two places at once. First, the base64 payload is streamed to the VM via writeFileToVm and saved at /app/uploads/<name>, so the agent can also use Read or grep on it inside the sandbox. Second, the same base64 is pushed onto the prompt blocks as { type: 'image', data, mimeType } immediately before the user text block, so Claude sees the picture in the same prompt that generates the form.
What if my image is bigger than 20 MB?
It still uploads to /app/uploads/<name> but it is not inlined. Anything over the 20 MB ceiling is left for the agent to open via Read. For most form designs this does not matter: a 1920x1080 PNG of a detailed multi-page form rarely exceeds 3 MB. If yours does, flatten layers or export at a lower DPI before uploading.
Can I upload a PDF of an existing paper form?
Yes. PDFs route through the 'application/pdf' branch of the attachment handler. They are saved to /app/uploads/<name> and the agent is told to use the Read tool to parse them. Claude can open the PDF in-VM, read structured text, and rebuild the fields as HTML. For a scan that is mostly raster, convert the pages to PNG first and use the image path so the picture reaches the model as a vision block.
Do I get plain HTML or a React component?
Your choice. Quick mode produces a single self-contained HTML file with inline CSS and JavaScript; it streams in under 30 seconds and is the right output when you want to paste markup into an existing site. VM mode runs a full Vite + React + TypeScript project inside an E2B sandbox; you get a typed component you can iterate on. Both modes accept the same image inputs.
Does the form match the image pixel-for-pixel?
It aims for structure, not pixels. Claude reads the image and reconstructs the fields, layout order, labels, grouping, and control types. Fonts, exact spacing, and brand colors come out approximately; you refine with a follow-up prompt (e.g. 'make the buttons teal, use a serif display font'). If the form has unusual controls like a split zipcode field or a currency input with a prefix, it picks that up from the picture.
How do I upload the image in practice?
Drag it onto the chat box on mk0r.com, or paste it from the clipboard, or click the paperclip icon. The request body sent to /api/chat includes an attachments array where each item is { name, mimeType, data }. Data is base64-encoded; the server decodes to estimate size with Math.ceil((length * 3) / 4) before deciding whether to inline.
Do I have to sign up to try it?
No. There is no account. Drop your image into the chat box on mk0r.com and send. The E2B sandbox boots on first prompt and the generated code lives in a per-session Git repo you can pull from the VM at any time.
What happens if the write to the VM fails?
The route logs a chat_attachment_failed event to PostHog with the filename, mime, size, and error message, then streams a user-visible 'Failed to upload <name>' error to the chat. The session keeps going; subsequent attachments are still tried. If the VM itself died, the chat route auto-retries the whole bootAndPrompt cycle once before surfacing a hard error.
Can I combine an image with written instructions in the same prompt?
Yes, that is the default. The attachment blocks are appended to promptBlocks first, then the user's text block is pushed last. So 'here is the design, but make the email field optional and add a phone number' lands as { [image], 'here is the design, but...' } in one vision-enabled turn. Claude sees both signals together.
Stop rebuilding forms by hand
Drop the picture of the form you want. mk0r reads it, writes the markup, and lets you refine with words.
Generate an HTML Form From an Image