Guide

The HTML Forms Generator That Reads Your Picture

Drop a Figma export, a phone photo of a paper form, or a screenshot of a competitor’s signup page. mk0r turns the pixels into markup in the same turn, no dragging fields, no visual builder to learn, no account.

Drop an Image of a Form
m
mk0r
7 min
4.8from 10K+ builders
Accepts PNG, JPG, HEIC, PDF, up to 20 MB
Same turn image plus prompt
No signup

The part every other guide about this topic skips

Look up any listicle of form generators online. The ones that rank will walk you through a drag-and-drop canvas, a field palette, a style panel, a publish button. Jotform. BeautifyTools. Paperform. Basin. FormBuilder.js. They all have the same shape: you build the form by hand, field by field, inside their UI.

Nobody in that list accepts a picture. Nobody lets you upload the form you already have, the one sitting in a Figma file, the one printed on a piece of paper on your desk, the one your competitor shipped last quarter. You still have to look at the design, then click through a menu fifty times to reproduce it.

That is the gap. A generator that can see is a generator you can hand a picture to and come back to a working form.

What the attachments pipeline actually does

When you drop a PNG into the chat box on mk0r.com, the file takes two parallel paths in the same HTTP request. One copy lands inside the sandbox filesystem; a second copy rides along as an inline vision block on the prompt. Both happen before Claude writes a single tag.

One image, two destinations, one turn

PNG / JPG
Figma export
PDF page
Phone photo
POST /api/chat
/app/uploads/<name>
Prompt vision block
Claude sees the image
Agent writes HTML

The anchor fact: 20 MB and a vision block

If you read one file in the repo to verify this page, read src/app/api/chat/route.ts lines 266 to 341. The constant MAX_INLINE_IMAGE sits at 20 MB, the decode size is estimated with Math.ceil((att.data.length * 3) / 4), and any image/* attachment under that ceiling is pushed onto the prompt blocks as a real image payload. This is the concrete bit nobody else has.

src/app/api/chat/route.ts

Two things to notice. The picture and the text travel in the same prompt, not two separate turns. And the picture is saved to disk too, so the agent can use any of its tools against the file afterwards, not just its eyes.

What you can drop in and what comes back

The attachment handler is mime-type driven. Each bucket is routed a little differently. For form work, the first row is the one that matters: image/* under 20 MB goes straight into the vision-enabled turn.

image/* under 20 MB

Inlined as a vision block. PNG, JPG, WebP, HEIC, GIF all qualify. The agent sees the picture and writes matching markup.

text/*, JSON, XML under 10 MB

Decoded from base64 and inlined as raw text. Useful for dropping in an existing HTML file or a form spec in JSON.

application/pdf

Saved to /app/uploads and the agent is told to open it with Read. Good for scanned paper forms.

Any other binary

Saved to /app/uploads with a path reference. No inlining. The agent can still run tools against it.

Over the size ceiling

File lands in /app/uploads, agent is told it is too large to inline, Read is recommended. Nothing is lost; only the inline shortcut is skipped.

The old way vs. this way

Recreating an existing form in a drag-and-drop builder is a task measured in clicks. The image-to-markup path is one drop plus one send.

Recreating a 14-field application form from a design

Open the builder. Create a new form. Drag a text input. Set its label. Set its name. Set its validation. Drag the next input. Repeat fourteen times. Pick a theme. Tweak spacing. Export HTML. Paste into your site. Discover a label is wrong. Go back. Fix.

  • 14 manual drag-drops for 14 fields
  • Another pass for validation and required flags
  • Style panel to approximate the design
  • Round-trip between builder and your site to test

Six things to drop in that people actually have

These are the inputs that come up in real sessions. Each one is a picture you already own; none of them fit into a visual builder.

A Figma form component exportA screenshot of a competitor signupA photo of a paper tax formA PDF page of a medical intakeA whiteboard sketch on a napkinA Sketch file exported as PNGA design system modal from DribbbleA scan of a paper job applicationA cropped Notion page screenshotA photo of a printed event RSVP card

Numbers from the route file

Every number below comes directly from reading src/app/api/chat/route.ts. None are marketing estimates.

0 MBMB limit per image
0 MBMB limit per text file
0Attachment routes (image, text, PDF, binary)
0Turn that mixes vision plus prompt

The 20 MB ceiling is MAX_INLINE_IMAGE on line 269. The 10 MB is MAX_INLINE_TEXT on line 270. The four routes are the four branches of the mime-type switch inside the for-loop. The one-turn behavior is the order of pushes onto promptBlocks: attachments first, user text last.

What happens from click to rendered form

This is the concrete path a single image takes through the system. Each step is something you could watch happen in DevTools Network tab or in the live screencast pane.

1

You drag the image onto the chat box

The client reads the file, base64-encodes it, and attaches it to the next outgoing request as { name, mimeType, data }.

2

POST /api/chat with the attachment

The server route extracts attachments from the JSON body, then walks the array once. Each item gets a destPath of /app/uploads/<name>.

3

writeFileToVm streams the bytes into the sandbox

Inside src/core/e2b.ts, writeFileToVm hits the ACP /write-file endpoint on the running VM. The file lands in the filesystem where the agent lives.

4

Vision block pushed onto promptBlocks

For image/* under 20 MB, a { type: 'image', data, mimeType } block is appended. A small [Uploaded image: <name>] marker text block follows so the agent knows the file is also on disk.

5

User's written prompt pushed last

promptBlocks.push({ type: 'text', text: prompt }). Claude receives all blocks together via /session/prompt, so the picture and the instructions arrive on the same turn.

6

Agent writes the form

Claude reads the image, inspects the saved file if needed, writes the HTML or the React component, opens the VM's Chromium at localhost:5173 via Playwright to confirm it renders. Every turn is also committed to the session's local Git repo.

Bring a form picture; we will build it with you

Book 15 minutes. Share your screen, drop the image into the chat, and we will walk through the generated markup together.

Book a call

How mk0r compares to the visual builders

A direct comparison against the generators that dominate this category. The differences are structural, not cosmetic.

FeatureDrag-and-drop buildersmk0r
Accepts an image of an existing formNoYes, up to 20 MB per image
Accepts a PDF scanNoYes, saved to /app/uploads for Read
Text plus image on the same turnn/aYes, image blocks pushed before text
Output is real source codeHTML snippet or embedded widgetPlain HTML or typed React component
Requires signupYesNo
Iterate with plain EnglishNo, drag fields againYes, follow-up prompts edit the markup
File also lives on disk in the sandboxNoYes, /app/uploads/<name>

Quick mode vs. VM mode for image-to-form

Both modes accept the same attachments payload. The difference is what the agent does with it afterwards.

Quick mode

Streams a single self-contained HTML file with inline CSS and a tiny JS handler. No sandbox, no Git, no services. Uses Claude Haiku for latency.

  • Under 30 seconds to first render
  • Paste the output into any existing site
  • Best for a static form you will wire yourself

VM mode

Boots an E2B sandbox with Vite, React, TypeScript, and Playwright. The agent writes a typed component, renders it at localhost:5173, confirms it looks right.

  • Typed source you can continue to iterate on
  • Per-turn Git history, undo and redo available
  • Best when the form has real behavior

One last thing the other pages forget

The image you drop in is not thrown away after the first turn. Because writeFileToVm saves it to /app/uploads/<name>, every follow-up prompt can reference it again. Ask the agent “look back at the design and add the missing checkbox from the third row” and it can open the file with its Read tool, see the checkbox, and add it.

0

copies of your image, one on disk, one in the prompt

0

fields you have to drag onto a canvas

0

turn that mixes design input with written instructions

Frequently asked questions

What kinds of images can I drop into mk0r to get an HTML form?

Anything up to 20 MB in an image/* mime type. In practice that covers Figma exports (PNG/JPG), screenshots of a competitor's form, phone photos of a paper or printed form, a cropped section of a PDF, sketches on a whiteboard, or screenshots from a design system. The limit lives in src/app/api/chat/route.ts as MAX_INLINE_IMAGE = 20 * 1024 * 1024.

How does the image reach the model?

Two places at once. First, the base64 payload is streamed to the VM via writeFileToVm and saved at /app/uploads/<name>, so the agent can also use Read or grep on it inside the sandbox. Second, the same base64 is pushed onto the prompt blocks as { type: 'image', data, mimeType } immediately before the user text block, so Claude sees the picture in the same prompt that generates the form.

What if my image is bigger than 20 MB?

It still uploads to /app/uploads/<name> but it is not inlined. Anything over the 20 MB ceiling is left for the agent to open via Read. For most form designs this does not matter: a 1920x1080 PNG of a detailed multi-page form rarely exceeds 3 MB. If yours does, flatten layers or export at a lower DPI before uploading.

Can I upload a PDF of an existing paper form?

Yes. PDFs route through the 'application/pdf' branch of the attachment handler. They are saved to /app/uploads/<name> and the agent is told to use the Read tool to parse them. Claude can open the PDF in-VM, read structured text, and rebuild the fields as HTML. For a scan that is mostly raster, convert the pages to PNG first and use the image path so the picture reaches the model as a vision block.

Do I get plain HTML or a React component?

Your choice. Quick mode produces a single self-contained HTML file with inline CSS and JavaScript; it streams in under 30 seconds and is the right output when you want to paste markup into an existing site. VM mode runs a full Vite + React + TypeScript project inside an E2B sandbox; you get a typed component you can iterate on. Both modes accept the same image inputs.

Does the form match the image pixel-for-pixel?

It aims for structure, not pixels. Claude reads the image and reconstructs the fields, layout order, labels, grouping, and control types. Fonts, exact spacing, and brand colors come out approximately; you refine with a follow-up prompt (e.g. 'make the buttons teal, use a serif display font'). If the form has unusual controls like a split zipcode field or a currency input with a prefix, it picks that up from the picture.

How do I upload the image in practice?

Drag it onto the chat box on mk0r.com, or paste it from the clipboard, or click the paperclip icon. The request body sent to /api/chat includes an attachments array where each item is { name, mimeType, data }. Data is base64-encoded; the server decodes to estimate size with Math.ceil((length * 3) / 4) before deciding whether to inline.

Do I have to sign up to try it?

No. There is no account. Drop your image into the chat box on mk0r.com and send. The E2B sandbox boots on first prompt and the generated code lives in a per-session Git repo you can pull from the VM at any time.

What happens if the write to the VM fails?

The route logs a chat_attachment_failed event to PostHog with the filename, mime, size, and error message, then streams a user-visible 'Failed to upload <name>' error to the chat. The session keeps going; subsequent attachments are still tried. If the VM itself died, the chat route auto-retries the whole bootAndPrompt cycle once before surfacing a hard error.

Can I combine an image with written instructions in the same prompt?

Yes, that is the default. The attachment blocks are appended to promptBlocks first, then the user's text block is pushed last. So 'here is the design, but make the email field optional and add a phone number' lands as { [image], 'here is the design, but...' } in one vision-enabled turn. Claude sees both signals together.

Stop rebuilding forms by hand

Drop the picture of the form you want. mk0r reads it, writes the markup, and lets you refine with words.

Generate an HTML Form From an Image