Build a Hugging Face demo app with no code, without Gradio or Spaces
Almost every guide on this topic sends you to Hugging Face Spaces and Gradio. That path is fine when the demo is a Python notebook in a Gradio shell. It is friction when what you actually wanted was a regular mobile-friendly web app that calls a model. There is a quieter path that swaps Gradio for a React UI generated by an AI agent inside a sandbox where the toolchain is already booted before you type the first prompt. This guide walks through it.
Direct answer (verified 2026-04-29)
Open mk0r.com, type build a UI that calls Hugging Face Inference Providers for <model id>, paste an HF read token when the agent asks for it, and a React + Vite app appears live in a sandbox where Node 20, python3, ffmpeg, and Chromium were already running. No Gradio, no Spaces config, no Python venv. The endpoint contract is documented at Hugging Face Inference Providers.
Why every other guide points at Spaces
Hugging Face Spaces is the obvious answer to this question if you read the docs cold. Spaces is HF's hosted runtime for Gradio and Streamlit apps. You write Python, push to a repo, the platform builds a container and serves the app. For people coming from a notebook the path is short. For people who only know they want a working web demo, it forces a detour through Gradio's idioms, Spaces' build pipeline, and HF's URL space.
That detour is real but small if you already write Python. It is large if you do not, and it is unnecessary if the demo is just a fetch call against an HTTP endpoint. The Inference Providers REST endpoint takes a JSON body or a binary audio blob and returns JSON. Anything that speaks fetch can talk to it. A React component is enough. Once you accept that, the question stops being "how do I learn Gradio fast enough to ship a demo this weekend" and becomes "who writes the React for me."
The friction stack you skip
Here is what does not appear in the loop when the demo is a React fetch instead of a Gradio Space. Each item is a thing the Spaces path asks for and the no-code-React path does not.
Things you do NOT do
- Hugging Face account (only needed for gated models or write access)
- Gradio install + theme + Interface or Blocks definition
- Spaces requirements.txt and README.md frontmatter
- Python venv, conda env, or pyproject.toml
- Picking a Spaces hardware tier and waiting for build
- Cold-start wait when a free Space wakes up
- Locking the demo URL to huggingface.co/spaces/<user>/<name>
That list is the whole point. The dollar value of skipping any one item is small. The compounding value of skipping all of them is what turns "weekend project" into "twenty minute conversation."
What the agent actually does
The sequence below is what happens between you typing the request and a working demo loading in your browser. Nothing in this sequence is "wait for build." The Vite dev server is already running on port 5173 inside the sandbox, the agent is editing the same files Vite is watching, HMR pushes updates to the in-VM Chromium tab, and the proxy on port 3000 streams that tab back to your browser.
One turn, end to end
You describe the demo
'Build a UI that calls the Hugging Face Inference Providers API for openai/whisper-large-v3 and shows the transcript.' One sentence, no setup.
VM is already running
Debian sandbox with Node 20, python3, ffmpeg, Chromium, and Vite serving on :5173. Nothing to install. The agent attaches to the existing process.
Agent edits /app/src/App.tsx
Scaffolds an upload form, writes a fetch() to https://router.huggingface.co/hf-inference/models/<model id> with Authorization: Bearer ${VITE_HF_TOKEN}, parses the response.
Agent loads the page in Chromium
Same in-VM browser the screencast shows. Playwright MCP clicks the upload button, drops a sample file, asserts the transcript area updates. If it fails, the agent fixes and retries.
You iterate by talking
'Add a copy-to-clipboard button.' 'Make it streaming.' 'Limit to 30 seconds of audio.' Each turn is more edits to the live React app, not a fresh generation from scratch.
Why the sandbox is already warm
The interesting part is not the prompt. The interesting part is what is already running before the prompt arrives. The mk0r sandbox is built from a single Dockerfile committed to the appmaker repo. Lines 26 to 57 of docker/e2b/e2b.Dockerfileinstall everything the agent might need to wire a model demo together. The image is baked once and every session boots from the same snapshot. That is why "install python3" never appears in any plan the agent writes.
Pre-installed in every session
- Node 20 (apt: nodejs from NodeSource)
- python3 + pip (apt: python3, python3-pip)
- psycopg2-binary (pip --break-system-packages)
- ffmpeg (apt)
- Chromium + Playwright MCP @ 0.0.70 (npm -g)
- postgresql-client, xvfb, x11vnc, websockify, cron, git (apt)
- @anthropic-ai/claude-code, social-autoposter (npm -g)
- Vite + React + TypeScript + Tailwind v4 (npm create vite, then npm install)
When you ask for a Hugging Face demo, the agent already has curl, npm install, pip install, a running Vite dev server, and a working browser to verify the result in. None of those are conditional. None of them block the first turn. If you wanted a Python sidecar (a lightweight transformers wrapper, say), pip is one apt-installed binary away.
The two artifacts, side by side
Concretely, here is what a Whisper demo looks like in each world. Left is the Spaces + Gradio path: an app.py, a requirements.txt, and a Spaces-flavored README. Right is the React + Inference Providers path: one component, one fetch call, one rendered transcript. Read both. The Gradio version pulls weights into the Space and runs them on Space hardware (or queues for ZeroGPU). The React version delegates inference to the HF Inference Providers fleet and the demo runs anywhere the static bundle is served.
Same demo, two artifact shapes
# Hugging Face Spaces: app.py
import gradio as gr
from transformers import pipeline
# Load model on Space hardware
asr = pipeline(
"automatic-speech-recognition",
model="openai/whisper-large-v3",
)
def transcribe(audio):
if audio is None:
return ""
return asr(audio, return_timestamps=False)["text"]
demo = gr.Interface(
fn=transcribe,
inputs=gr.Audio(type="filepath", label="Upload audio"),
outputs=gr.Textbox(label="Transcript"),
title="Whisper demo",
description="Drop an audio file.",
allow_flagging="never",
)
if __name__ == "__main__":
demo.launch()
# Plus: requirements.txt
# transformers
# torch
# gradio
# Plus: README.md frontmatter (Spaces config)
# ---
# title: Whisper demo
# sdk: gradio
# sdk_version: 4.44.0
# app_file: app.py
# pinned: false
# ---The shape of your weekend
Here is the same Saturday on both paths. The toggle below is the honest version. Gradio is a fine tool and the Spaces team has done careful work to keep the build pipeline short. The point is not that one is bad, it is that they end up at different artifacts and one of them is a Python container and the other is a Vite app you can keep iterating on for months.
Saturday morning, build a Whisper demo
Open the HF dashboard. Create a new Space. Pick the SDK. Decide between CPU basic and CPU upgrade. Clone the repo locally or use the HF web editor. Write app.py with gr.Interface. Pin gradio_version. Write requirements.txt. Add the Spaces frontmatter to README.md. git push. Wait for the Space to build (visible in the build log). Open the Space URL. Hit a cold start on first request. Iterate by editing app.py and pushing again, which rebuilds the Space.
- Python venv or notebook setup
- Gradio API surface to learn
- Spaces hardware tier choice
- Build wait + cold start on first request
- Demo URL locked to huggingface.co/spaces/<user>/<name>
When this is the wrong path
A page on this topic that does not name its limits is selling something. Three honest cases where Spaces still wins.
- You want HF community discoverability. A Space is indexed inside the Hugging Face ecosystem. People browse Spaces by model and tag. A Vite app on a personal domain is invisible to that audience. If "more HF users find this" is the goal, ship a Space.
- You need free GPU. Spaces offers a ZeroGPU tier that is queue-based but free. mk0r's sandbox is 1 vCPU, no GPU, and it does not host model weights. If your demo runs a 7B param model that you cannot afford to call via Inference Providers, a Space with ZeroGPU is the right call.
- The demo IS the notebook. If you already have a polished Jupyter notebook and the artifact you are sharing is essentially that notebook, Gradio wraps it in three lines. Asking an AI agent to translate a notebook into a React app is more work than gr.Interface(your_function).launch().
What the agent does not do for you
For the same reason: rate limiting on the Inference Providers side is real and the agent will not magic it away. If your demo gets posted to a community and a thousand strangers click it in an hour, the read token gets throttled and the React app needs a fallback. The agent will write the fallback if you ask it to (a queue, a graceful error, a cached last-good response), but it does not auto-add it on turn one. A demo built as a fetch UI is closer to a real product than a Spaces page is, which means it inherits real product concerns earlier.
The other category the agent will not solve: model-side cold starts. If you call a model that the chosen provider has not warmed recently, the first response is slow. That is a property of the provider, not the React UI. Spaces hides this behind its own "starting up" loader. mk0r exposes it as a slow fetch, which is honest and slightly worse UX out of the box. Worth knowing.
Want a walkthrough on a specific model?
Book a 20 minute call. Bring the model id and the rough demo idea. We will build the React UI live so you can see whether this path fits before committing.
Frequently asked questions
Why not just use Hugging Face Spaces and Gradio?
Spaces is a great host for Python ML demos and you should use it when you want a Python notebook to be the artifact. The friction shows up when you wanted a regular mobile-friendly web app. You inherit a Gradio UI, a paused free Space cold-starts on the first request after idling (the wait is real even if the exact seconds depend on the model), and the demo lives behind huggingface.co/spaces/<user>/<name>. mk0r is the reverse trade. You get a React+Vite app that calls the Inference Providers REST endpoint via fetch, hosted wherever you want, with full control over the UI.
Do I need a Hugging Face account to build a demo this way?
Only if the model is gated. The Inference Providers endpoint accepts an HF read token. If a teammate hands you a token for a public model and you only need to call it at read-only inference, you can build the demo without ever signing up for HF yourself. You do need to have the token in hand before the agent finishes wiring the call. mk0r asks for it as an env var and writes it into /app/.env so the React app reads it via VITE_HF_TOKEN at build time.
Where does the model actually run?
On Hugging Face's Inference Providers fleet. The mk0r sandbox is 1 vCPU, no GPU, so it does not run llama.cpp or PyTorch locally. Your React UI sends an HTTP request to router.huggingface.co, the chosen provider runs the model, the response streams back. The sandbox is the carrier vehicle for the UI, the network call, and any glue code (rate limiting, retries, caching) you ask the agent to add. Anything that can be expressed as 'POST text or audio to an endpoint, render the response' fits.
What does the agent actually do when I say 'build a Hugging Face demo for openai/whisper-large-v3'?
It opens /app/src/App.tsx in the running Vite project, scaffolds a file-upload form for an audio clip, writes a fetch() call to https://router.huggingface.co/hf-inference/models/openai/whisper-large-v3 with Authorization: Bearer ${VITE_HF_TOKEN}, parses the JSON transcript, renders it. Then it loads the page in the in-VM Chromium via Playwright MCP, screenshots it, checks the upload button is reachable, and reports back. None of that is a 'build pipeline'. It is direct edits to a Vite app that is already serving on port 5173 with HMR.
Where is the toolchain defined that makes this not require provisioning?
Lines 26 to 57 of docker/e2b/e2b.Dockerfile inside the appmaker repo. One apt-get block installs chromium, ffmpeg, libnss3, libxss1, xvfb, x11vnc, websockify, python3, python3-pip, postgresql-client, cron, and git. A second block installs Node 20 from the NodeSource repo. A third block npm-installs @playwright/mcp@0.0.70, ws, @agentclientprotocol/claude-agent-acp@0.25.0, @anthropic-ai/claude-code, and social-autoposter globally. The image is baked once and every E2B sandbox boots from that snapshot, so 'install python3' or 'install ffmpeg' is never a step in the agent's plan.
Can I call models that need Python sidecars (transformers, sentence-transformers)?
Yes for inference that fits in 1 vCPU and ~1 GB RAM. The VM has python3 and pip ready, so the agent can pip install transformers, sentence-transformers, or any other lib, run a small Python sidecar that exposes a /predict HTTP endpoint, and have the React UI talk to that. You will hit memory limits on big models. For anything that needs a GPU the right answer is still to call Inference Providers (or your own GPU host) over HTTP. The sandbox is for orchestration and UI, not for hosting the weights.
What about streaming output (token-by-token for chat models)?
Inference Providers supports server-sent events and streaming JSON. The agent can wire a ReadableStream in fetch() and pipe tokens straight into the React state. Because Vite serves with HMR you watch the streaming UI come alive in the same Chromium tab the agent is using to verify its work. There is no extra deployment step between writing the fetch() call and seeing tokens stream into the page.
Can I export the demo and host it elsewhere later?
Yes. Each session is provisioned with a private GitHub repo, the URL lives in /app/.env as GITHUB_REPO_URL. The agent commits the Vite project to that repo on request. You walk away with a regular Vite + React + TypeScript codebase that runs anywhere npm run build runs. The Hugging Face token is the only thing you need to set on the host. There is no proprietary runtime in the export and no mk0r-only build step.
What does this skip vs a full Spaces deploy?
It skips: writing Gradio components, picking a Gradio theme, defining gr.Interface or gr.Blocks, learning the difference between gradio.app and Spaces hardware tiers, configuring requirements.txt for the Space, picking CPU basic vs CPU upgrade, waiting for the Space to build, and writing README.md frontmatter that the Spaces builder parses. None of that is part of the loop because none of that is part of the artifact. The artifact is a single React component that owns the form, the fetch call, and the rendering.
When is Spaces still the better choice?
When the demo is the model itself and you want it discoverable inside the HF community, when the workflow is a python notebook you already wrote, when you need the free GPU tier (Spaces offers ZeroGPU on a queue, mk0r does not host weights), or when you want HF to handle hosting and billing. mk0r is the better path when the demo is a real product UI you intend to keep iterating on, when you want a mobile-first interface, or when the demo is meant to share over a normal URL with people who do not know what Hugging Face is.