Guide

Build a local LLM chat app without code: your model, your laptop

Most guides for this drop you into Open WebUI, AnythingLLM, or LibreChat. Those are great when the chat UI you want is the chat UI those projects already ship. The moment you want a custom shape, a Markdown sidebar, a kid-friendly skin, a chat that doubles as a recipe planner, you are forking a codebase. There is a quieter path: describe the chat UI you want to mk0r, walk away with a small Vite + React + TypeScript project, run it on the same machine where Ollama is running. The weights never leave your laptop. Neither do your prompts.

Matthew Diakonov, Written with AI

Published May 7, 20269 min

Direct answer (verified 2026-05-07)

Run Ollama on your machine on port 11434. Open mk0r.com, describe a chat that streams from http://127.0.0.1:11434/api/chat, export the generated Vite project, run it locally with npm run dev. Set OLLAMA_ORIGINS to allow your dev URL first or the browser blocks the call with a CORS error. The endpoint contract is documented at github.com/ollama/ollama/blob/main/docs/api.md.

Why every other guide assumes the model is in the cloud

The honest reason is that most no-code platforms cannot reach your laptop. Bubble, Glide, Softr, Voiceflow, Zapier, all of them run inside a hosted runtime in someone else's data center. When their docs say 'plug in an LLM,' they mean an HTTP endpoint they can reach: OpenAI, Anthropic, Cohere, an OpenRouter or AWS Bedrock proxy. None of them can call http://127.0.0.1:11434 on your machine because they are not on your machine.

That leaves two paths in most articles you will find on this topic. The first is 'install Open WebUI,' which is excellent software and the right answer when the chat UI you want is the one Open WebUI ships. The second is 'host your model with Ollama on Fly.io,' which works but cancels the privacy and cost story that drove you toward a local LLM in the first place. Both routes skip the question of how to get a custom UI you actually own.

mk0r fits a different gap. The agent runs in a sandbox in the cloud, but the artifact (the Vite project) is portable code. You take the artifact home and run it next to Ollama. The model never travels.

The shape of the system

Three machines do three things. The diagram below is the whole architecture. Nothing else is hidden behind it.

Where each part lives

The mk0r sandbox writes code; it does not host weights. The product config is explicit about this: 'Heavy local model inference is out of scope for the standard 1 vCPU sandbox.' That is the constraint that makes the whole story work. Because the sandbox cannot be the model host, the model has to be on your machine, which means your prompts have to be on your machine too, which is exactly what 'local LLM' means in the first place.

Step 1. Get Ollama running with the right CORS gate

Install Ollama from ollama.com/download. Pull a model. Llama 3.2 is a fine starting point on a 16 GB MacBook; Mistral 7B and Qwen 2.5 fit too. The exact model does not matter for this guide; the wire format does.

Before you start the daemon, set OLLAMA_ORIGINS so the browser is allowed to call it from a different origin. On macOS:

launchctl setenv OLLAMA_ORIGINS "http://localhost:5173,http://127.0.0.1:5173"
# then quit and re-open the Ollama menubar app
ollama pull llama3.2
ollama serve   # if running from the CLI

On Linux you set it in the systemd unit override or export it in the shell that runs the binary. The Ollama FAQ at github.com/ollama/ollama/blob/main/docs/faq.mdx walks through every platform. If you skip this step, the chat will fail with a CORS error in the browser console the first time you click Send. The error message is loud; you cannot miss it.

Sanity check from a terminal on your laptop:

curl -sN http://127.0.0.1:11434/api/chat \
  -d '{"model":"llama3.2","messages":[{"role":"user","content":"hi"}],"stream":true}' \
  | head -3

You should see JSON-lines streaming back, one object per token chunk. Each object has message.content with the next piece of text. That is the shape the generated UI parses.

Step 2. Describe the chat to mk0r

Open mk0r.com. No account, no setup. Type one sentence that names the URL, the model, and the behavior you want. The smaller the sentence, the more honest the result; the more specific, the less iteration you spend later.

Two examples that work first try:

"A mobile-first chat that streams from my Ollama at http://127.0.0.1:11434/api/chat, model llama3.2, with markdown rendering and a stop button."
"A two-column chat. Left side: history of past conversations from localStorage. Right side: streaming response from Ollama at 127.0.0.1:11434, model qwen2.5. Allow me to switch model via a dropdown that pulls from /api/tags."

The under-the-hood scaffold is named in the system prompt at src/core/e2b.ts line 170: a Vite + React + TypeScript + Tailwind CSS v4 project at /app, dev server on port 5173 with HMR, Playwright MCP wired into a real Chromium for browser testing. The agent edits files inside that scaffold; it does not regenerate the world on every turn. That is what makes iteration cheap.

Step 3. What you skip vs writing it by hand

The chat client itself is not exotic, but the small bits add up. Streaming JSON-lines parsing, abort-on-resubmit, content-by-content append into React state, the textarea that grows with content, the auto-scroll-to-bottom on new tokens. Here is what your editor looks like the by-hand way vs the prompt-mk0r way.

ollama chat client

// Roughly what you write by hand for an Ollama chat client
import { useRef, useState } from "react";

type Msg = { role: "user" | "assistant"; content: string };

export default function App() {
  const [messages, setMessages] = useState<Msg[]>([]);
  const [input, setInput] = useState("");
  const abortRef = useRef<AbortController | null>(null);

  async function send() {
    abortRef.current?.abort();
    const ac = new AbortController();
    abortRef.current = ac;

    const next: Msg[] = [...messages, { role: "user", content: input }];
    setMessages([...next, { role: "assistant", content: "" }]);
    setInput("");

    const res = await fetch("http://127.0.0.1:11434/api/chat", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({
        model: "llama3.2",
        messages: next,
        stream: true,
      }),
      signal: ac.signal,
    });
    if (!res.body) return;

    const reader = res.body.getReader();
    const decoder = new TextDecoder();
    let buf = "";
    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
      buf += decoder.decode(value, { stream: true });
      const lines = buf.split("\n");
      buf = lines.pop() ?? "";
      for (const line of lines) {
        if (!line.trim()) continue;
        const json = JSON.parse(line);
        const delta = json.message?.content ?? "";
        setMessages((cur) => {
          const last = cur[cur.length - 1];
          return [
            ...cur.slice(0, -1),
            { ...last, content: last.content + delta },
          ];
        });
      }
    }
  }

  return (
    <main className="min-h-screen p-4 max-w-md mx-auto">
      <ul className="space-y-2 mb-4">
        {messages.map((m, i) => (
          <li key={i} className={m.role}>{m.content}</li>
        ))}
      </ul>
      <textarea
        value={input}
        onChange={(e) => setInput(e.target.value)}
        className="w-full border p-2"
      />
      <button onClick={send} className="mt-2 px-4 py-2 bg-teal-500 text-white">
        Send
      </button>
    </main>
  );
}

91% fewer lines you wrote

The point is not that the by-hand version is hard. It is not. The point is that you trade an hour of fiddling with stream parsing for a sentence, and the agent verifies in a real Chromium before it claims done. If the streaming reader has an off-by-one at the JSON boundary, the agent catches it (the response area never updates) and fixes it before reporting back.

Step 4. Walk through what actually happens

prompt to running app

⚙️

Ollama on your laptop

The model lives here. Set OLLAMA_ORIGINS to allow http://localhost:5173 before you launch the daemon, then run ollama pull llama3.2 (or the model you want).

🌐

Describe in mk0r

'A chat UI that streams from my Ollama at 127.0.0.1:11434, llama3.2, with a Markdown renderer for the replies.' One sentence, no setup, no account.

📦

Agent writes the Vite app

Edits /app/src/App.tsx, wires a fetch() to /api/chat with stream:true, parses JSON-lines, renders. Plays it in the in-VM Chromium until typing returns a response.

↪️

Export to your repo

GITHUB_REPO_URL in /app/.env. Agent commits on request. You walk away with a normal Vite + React + TS project.

✅

Run on your laptop

git clone, npm install, npm run dev. The browser hits 127.0.0.1:11434, Ollama answers, tokens stream into the UI. Nothing leaves the machine.

Two of those five steps are on your machine; three are in the sandbox. The split is the design. The sandbox is good at boilerplate, file edits, browser verification. Your laptop is good at running a 7B model without paying anyone. The export step is the bridge.

Step 5. Export and run it on your laptop

When the agent is done, ask it to commit. Each session is provisioned with a private GitHub repo whose URL lives in /app/.env as GITHUB_REPO_URL. From your terminal:

git clone <your-repo-url>
cd <your-repo>
npm install
npm run dev
# vite running on http://localhost:5173

Open the URL. The chat connects to http://127.0.0.1:11434 (the agent wrote that base URL into .env as VITE_OLLAMA_URL by default; change it if your Ollama listens elsewhere). Send a message. Tokens stream in. Network panel shows requests to localhost only. Activity Monitor shows your CPU or GPU spinning, not your network card. That is the whole point.

Iteration from here is a normal codebase. Push to the repo. Deploy to Cloudflare Pages or static-host the build output if you want to share the UI; the people you share it with will need their own Ollama for the chat to work. There is no proprietary runtime in the export and no mk0r-only build step.

The honest limits

Three things to know before you commit.

The mk0r preview cannot reach your Ollama. While the agent is iterating inside the sandbox, the in-VM Chromium is on a different machine than your laptop. It cannot fetch your localhost. The agent verifies the UI shape (input, message list, send button) but it cannot verify a real model round-trip from inside the sandbox. The first true round-trip happens after you export and run locally. If that distinction matters, an alternative is to ask the agent to call a public endpoint (Hugging Face Inference Providers, OpenRouter) so the in-sandbox preview is a full system, then swap the base URL to your Ollama on export. That is a one-line change in /app/.env.

Multi-user shared chat is not this shape. The exported app is a single-page client that talks to a localhost daemon. If you want friends or coworkers to use it, you need a real backend (a Postgres, a streaming relay, a per-user model context). Open WebUI is closer to that out of the box. mk0r is a fit when 'local' really means 'me, on my laptop'.

RAG over a personal corpus is one prompt away but two prompts of work. You can ask the agent to add a file-upload box that chunks documents and stores embeddings via a local Ollama embedding model (mxbai-embed-large works). It will scaffold the basics. The retrieval quality, chunk size, and prompt template are knobs you will tune by hand. None of that is unique to mk0r; the same is true if you build it in any framework. The agent gets you a working baseline, not a finished retrieval system.

When you should not do this

Use Open WebUI, AnythingLLM, or LibreChat when the goal is a polished chat with auth, multi-user, plugins, RAG out of the box, and you do not want to maintain any of it. They are well maintained; you will be happier than building.

Use Hugging Face Inference Providers or OpenRouter when the privacy and cost story does not matter. A hosted endpoint plus a generated UI removes the CORS gate and the export step.

Use mk0r for the case in between: a custom UI that you actually own, talking to a model that lives on the same machine as the user. A weekend project. A privacy-first chat for one. A chat for a kid that strips out anything you do not want them to see. Anything where the UI is small, the model is yours, and the artifact deserves to be a regular Vite project you can fork, share, or deploy.

Want help mapping this to your Ollama setup?

Bring your machine, your model, and the chat shape you want. We will walk through the prompt and the export together so you leave with a running app.

Frequently asked questions

What does 'local' actually mean here? The sandbox runs in the cloud.

Local means the model. mk0r is the UI generator. The agent that writes your chat app runs in an E2B sandbox (1 vCPU, ~1 GB RAM, no GPU, so it physically cannot host LLM weights). What it produces is a Vite + React + TypeScript project that, when you run it on your laptop, fetch()es http://127.0.0.1:11434/api/chat. Ollama runs on your CPU or GPU, holds the weights, and answers the request. mk0r never sees a token, a prompt, or a response from your private chats. The only thing mk0r 'sees' is the UI you described in plain English.

Why not just use Open WebUI, AnythingLLM, or LibreChat?

Use them when the goal is the standard chat UI and you do not want to maintain it. They are mature, polished, and battle-tested. The friction shows up the moment you want a custom shape: a chat that doubles as a recipe planner, a kid-friendly skin, a chat with a sidebar of saved characters, a streaming UI that swaps the response into a Markdown editor. Forking those projects means reading their codebase. mk0r is the reverse trade. You describe the UI you want, you walk away with a small Vite project that does exactly that and nothing else, and you own the source. Ollama's HTTP API is small (about a dozen routes) so a generated UI on top of it does not have to drift far from a working baseline.

What does the agent literally write into the project?

It opens /app/src/App.tsx in the existing Vite project (the dev server is already running on port 5173 with HMR; the system prompt at src/core/e2b.ts line 170 names that scaffold). It scaffolds a chat input, a message list, and a streaming reader that reads the JSON-lines response Ollama returns from /api/chat. It writes the model name and base URL into /app/.env so the React app can read them via import.meta.env.VITE_OLLAMA_URL and VITE_OLLAMA_MODEL. It opens the running app in the in-VM Chromium via Playwright MCP and asserts that typing in the input and submitting renders a response area. Until that round-trip works, it does not report done. None of those edits are 'build pipeline' steps. They are direct file writes plus an HMR refresh.

What about CORS? Will the browser actually let a remote-origin web app talk to localhost:11434?

Not by default. Ollama refuses cross-origin requests for safety. You set OLLAMA_ORIGINS to your dev URL (or to '*' if you only run it on a trusted laptop) before you launch the daemon. On macOS that means launchctl setenv OLLAMA_ORIGINS "http://localhost:5173,http://127.0.0.1:5173" then restart the Ollama menubar app, or export OLLAMA_ORIGINS in the shell where you start the binary. The Ollama FAQ documents this at github.com/ollama/ollama/blob/main/docs/faq.mdx. If you are previewing inside the mk0r sandbox iframe, you will not be able to hit your laptop's localhost from there at all (different machine), which is why this whole guide ends with 'export and run it locally'.

Does the chat actually stream tokens or does it wait for the whole response?

Streams. Ollama's /api/chat returns a JSON-lines stream when you POST { stream: true }. The agent wires a ReadableStream reader, splits on newlines, parses each line as JSON, and pipes message.content into React state on every chunk. Because Vite serves with HMR, you watch the streaming UI start rendering tokens as soon as you save. There is no 'wait for the whole response' step unless you explicitly ask the agent for one (which you might, to call .json() with a non-streaming model invocation when you only want a final answer).

What about LM Studio, llama.cpp, or other local servers?

Same playbook with a different base URL. LM Studio's local server is OpenAI-compatible and listens on http://localhost:1234/v1/chat/completions; you ask the agent to use the OpenAI shape and that base URL. llama.cpp's server runs at http://localhost:8080/completion (or /v1/chat/completions in OpenAI-compat mode). vLLM and Text Generation WebUI both expose OpenAI-compatible endpoints. The agent picks the right shape from the URL you give it. The pattern is the same: the UI calls localhost over HTTP, the model runs on your machine, mk0r stays out of the data path.

What does this skip vs writing a chat client by hand?

It skips Vite scaffold, picking a state library, writing the streaming JSON-lines reader, writing the message bubble component, picking a Markdown renderer for assistant replies, picking a code block highlighter, picking a Tailwind config, writing a useEffect that aborts the previous request when the user submits another, writing a textarea that auto-resizes. It also skips the part where you read Ollama's docs to remember whether the response shape is { message: { content } } or { response }. The agent has a habit of getting that right by reading the docs the first time, then carrying the shape into every subsequent edit.

Where does the project go when I am done?

Each session is provisioned with a private GitHub repo; the URL lives in /app/.env as GITHUB_REPO_URL. The agent commits on request and you walk away with a regular Vite + React + TypeScript codebase. On your laptop it is git clone, npm install, npm run dev. The only env var you need to set is VITE_OLLAMA_URL (or skip it; the default the agent writes is http://127.0.0.1:11434). There is no proprietary runtime in the export and no mk0r-only build step. If you decide later to swap Ollama for OpenAI's API, you change one base URL.

When does this approach not fit?

Three honest cases. First, if the goal is a server-side multi-user chat with shared memory, you want a real backend (Postgres + a streaming relay), not a single-page app talking to localhost. Second, if you want it to work for friends who do not run Ollama, you have to host the model somewhere; once you host the model, 'local' is gone and you might as well use Hugging Face Inference Providers or OpenRouter. Third, if you need polished features like RAG over a corpus, model switching with hot-swap, persistent chat history with a migration story, you will outgrow a one-evening Vite app and want Open WebUI or AnythingLLM instead. mk0r is best when the chat UI is small, custom, and disposable.

Is the generated app any good or is it boilerplate that I will rewrite anyway?

Honest answer: it is a clean baseline, not a finished product. You will tweak the prompt template, the streaming display, and the message persistence. But the parts that take an hour by hand (wiring fetch() to a stream, parsing JSON-lines, rendering Markdown safely, handling abort on resubmit) come out in shape on the first generation. The iteration loop is what saves the time, not the first draft. You say 'add a model picker that lists results from /api/tags', the agent edits the same file, you watch it appear in the preview, you keep going.