Guide

Vibe coding rate limits, as your AI builder actually sees them

Every AI app builder eventually shows you a “you hit your limit” toast. The interesting question is what the underlying stack saw before that toast fired, and how much of it your builder was willing to surface. Most show none of it. Here is the trace.

Matthew Diakonov, Written with AI

Published April 30, 20268 min

Direct answer · verified 2026-04-30

What actually happens when you hit a rate limit while vibe coding

The Anthropic API emits one of three signals: an automatically-retried 429 (transient, you should never see it), a structured rate_limit_event with status: "allowed_warning" and a utilization percentage, or the same event with status: "rejected" and a UNIX resetsAt. Whether your AI builder shows you any of that detail depends on whether it forwards the SDK event. Most do not. Authoritative source for the underlying caps: Claude API rate limits docs.

4.9from open source on GitHub

Patched ACP forwards 8 rate-limit fields

api_retry, rate_limit, credit_exhausted, all typed

No account, no signup, just describe and build

The signal trace

A single user prompt that runs into a weekly cap takes a fairly long trip. The Anthropic API decides the cap was hit, the streaming response carries a structured event, the Claude Agent SDK packages it, the ACP wrapper translates SDK events into client-facing sessionUpdates, and finally the chat bridge forwards the update to the browser. Anything the wrapper does not know how to translate is silently swallowed, which is where most builders lose the signal.

One rate_limit_event, end to end

The third row is the bug. The fourth row is the patch. The fifth row is what makes a useful UI possible. Without the patch on row four, the only thing the browser ever sees is a generic prompt error after the prompt fails, which is why so many AI builders converge on the same opaque “something went wrong” toast.

What gets dropped vs. what gets through

The stock @agentclientprotocol/claude-agent-acp wrapper handles a fixed set of session update types. Anything new the SDK starts emitting is dead on arrival until the wrapper catches up. mk0r intercepts the iterator before the wrapper sees the message and translates it into a custom sessionUpdate the bridge already knows how to forward. The diff is small, but it is the difference between a useful indicator and a black box.

A weekly cap getting hit, two ways

User sends a prompt. The cap is hit on Anthropic's side. The SDK iterator yields a rate_limit_event with status 'rejected' and a resetsAt timestamp. The stock wrapper has no mapping for the type and silently drops it. The prompt eventually fails with an error a few seconds later, the bridge forwards a generic prompt_error, and the UI shows 'something went wrong, please try again.' The user has no idea when the cap resets or whether retrying is pointless.

rate_limit_event silently dropped
no resetsAt surfaced
no utilization warning before rejection
user retries blindly until the underlying error message bubbles up

The eight fields the patched entry point forwards

This is the anchor fact. If you read src/core/vm-scripts.ts between lines 144 and 163, the rate-limit translator picks off exactly these eight keys from item.value.rate_limit_info. The shape mirrors SDKRateLimitEvent from the Claude Agent SDK and the on-the-wire schema documented by Anthropic, with the same field names normalized to camelCase.

rate_limit_info → sessionUpdate

status — allowed, allowed_warning, rejected, or unknown
resetsAt — UNIX ms when the cap unlocks, or null
rateLimitType — which cap was hit (e.g. 'five_hour', 'weekly')
utilization — 0 to 1, how close to the cap you are
overageStatus — whether the overage pool is in use
overageDisabledReason — why overage is unavailable, if applicable
isUsingOverage — boolean, billing against the overage pool
surpassedThreshold — numeric threshold crossed, if any

Each field has a job. status drives whether the UI shows nothing, a warning, or a hard error. resetsAtturns “you hit your limit” into “resets in 4h 32m”. utilization lets the UI render a progress meter so a long vibe-coding session does not surprise you. overageStatus and isUsingOverage matter for paid plans where overage burns at a different rate. surpassedThreshold is what you want to fire a one-time “you crossed 80%” toast on.

api_retry is a different signal

People mix these up. api_retry fires while the SDK is automatically retrying a transient failure. Its fields are httpStatus, errorType, attempt, maxRetries, and retryDelayMs. It is a per-request signal, not an account-level one. mk0r surfaces it as an amber dot in the chat overlay reading “Retrying request, attempt 2/3, rate_limit (HTTP 429), next in 4s”, code in src/app/(landing)/page.tsx around line 511.

The two signals can fire together. A retry can succeed and still push you over the soft threshold, in which case you will see the retry banner clear and a fresh rate_limit indicator pin in its place. They can also fire independently. A connection-reset api_retry tells you nothing about your cap; it tells you the network blipped.

Why I picked Haiku as the default

The default model is set on src/lib/acp-chat-store.ts line 189 and re-applied per session in src/core/e2b.ts line 1275 with { modelId: "haiku" }. That choice has a direct effect on rate-limit headroom. Haiku consumes far fewer tokens per turn than Sonnet or Opus on the same prompt, so a fixed weekly cap stretches across roughly five to ten times as many vibe-coding iterations before utilization ticks into the warning band. If you switch the model up, you trade headroom for code quality. Both directions are honest. The point is that the cap is not the variable you control; tokens-per-turn is.

5-10×

“Vibe coding burns through tokens fast; one debugging session with extended thinking can eat half your daily quota before lunch.”

Top-ranking 2026 vibe-coding plan comparison guides

What the user actually sees

The chat overlay surfaces three distinct UI states based on which stream event last arrived. The classifier that decides which overlay to render is the isStructuredCredit check at the top of src/app/api/chat/route.ts, which combines errorType, httpStatus (402 or 429), and a regex fallback for stock-ACP installs that do not forward the typed signal.

api_retry

Amber dot, “Retrying request, attempt N/M”, with error type, HTTP status, and the countdown to the next attempt. Transient. Clears on success.

rate_limit / allowed_warning

Soft indicator with utilization. Non-blocking. The prompt goes through. You know the cap is approaching before it hits.

credit_exhausted

Persistent overlay. “Usage limit reached” plus the raw SDK message (which contains the human reset time). Retry is gated; restarting ACP would fail with the same error.

Honest limits

A few things I want to be straight about. mk0r does not have its own request quota layered on top of Anthropic; the cap you hit is whichever Claude Code subscription tier is in play. Switching builders does not move the cap if both builders use the same credentials. If the SDK starts emitting a new field on rate_limit_info, the patched entry point will pass it through but the UI will not render it until the type is updated in src/lib/chat-events.ts. And the whole patched-ACP approach is a temporary workaround for a stock-wrapper gap; once upstream catches up, the patch becomes redundant. That is fine; the goal was always to surface the signal today, not own the protocol forever.

I am genuinely curious which builder you tried before this and what it showed you when you hit a cap. The honest answer is most of them show nothing, but I would rather hear that from someone who tried three.

Want me to walk you through the rate-limit forwarding?

Twenty minutes, screen share, I'll open the patched ACP file and the Anthropic SDK iterator side by side and trace one event through. Good for anyone building on top of @agentclientprotocol/claude-agent-acp.

Frequently asked questions

What actually counts as a rate limit when you vibe code?

Three different things, and most pages on this topic conflate them. The first is a transient 429 from the Anthropic API: a single request was throttled, and the SDK retries automatically with exponential backoff. The second is a soft warning: the rate_limit_event arrives with status 'allowed_warning' and a utilization percentage between 0 and 1, meaning your prompt is going through but you are close to your weekly cap. The third is a hard rejection: status 'rejected' with a resetsAt UNIX timestamp telling you exactly when the cap unlocks. The first you should never see. The second you should see as a non-blocking banner. The third should pin a 'usage limit reached, resets at H:MM' message at the bottom of the chat. If your AI builder treats all three as the same generic 'you hit your limit' toast, it is not reading the structured event at all.

Why don't most AI builders show this detail?

Because forwarding the signal takes a patch. The Anthropic API emits a structured rate_limit_event over the streaming connection. The Claude Agent SDK packages it as an SDKRateLimitEvent. The stock @agentclientprotocol/claude-agent-acp wrapper, which is what most agentic apps build on, drops the event in its sessionUpdate translator because it does not have a mapping for that type yet (this was reported in anthropics/claude-agent-sdk-python issues 583, 601, 603, where the Python SDK actually crashed on the same message). mk0r works around it by intercepting the SDK iterator before the wrapper sees it, in src/core/vm-scripts.ts, and translating the event into a custom 'rate_limit' sessionUpdate that the bridge forwards to the browser.

What are the eight fields on rate_limit_info that mk0r forwards?

status (allowed, allowed_warning, rejected, or unknown), resetsAt (UNIX milliseconds for when the cap unlocks, or null), rateLimitType (a string identifying which cap was hit, e.g. 'five_hour' or 'weekly'), utilization (a number between 0 and 1 that tells you how close to the cap you are), overageStatus (a string describing whether the overage pool is in use), overageDisabledReason (why overage is not available, if applicable), isUsingOverage (a boolean for whether the current request is being billed against the overage pool), and surpassedThreshold (a numeric threshold the user crossed, if any). The full forwarding lives between lines 144 and 163 of src/core/vm-scripts.ts. Anything the SDK adds in a later version still flows through the unknown-field passthrough; the patched entry point only normalizes the eight fields it knows about and lets the rest ride along.

What is api_retry and how is it different from rate_limit?

api_retry is the per-request retry signal. It fires while the SDK is automatically retrying a transient failure (a 429, a 500, a connection reset) before giving up. The fields are httpStatus, errorType, attempt, maxRetries, and retryDelayMs. mk0r forwards it as a separate stream event from rate_limit because it has a different lifecycle: api_retry comes and goes within a single user prompt, while rate_limit reflects account-level state that persists across prompts. In the UI, an api_retry shows as a transient 'Retrying request, attempt 2/3' banner with the next-retry countdown. A rate_limit shows as a sticky usage indicator. They can fire together: the retry attempt itself can be the request that surfaces the rate-limit ceiling.

Does mk0r have its own rate limit on top of Anthropic's?

No. mk0r runs on the user's session against Anthropic directly, so the rate limits you see are Anthropic's account-level caps for whatever Claude Code subscription tier is in play (Free, Pro, Max). There is no mk0r-side request quota layered on top. The reason this matters: if you switch builders to escape a rate limit, and both builders use the same underlying Anthropic key or your own subscription, you will hit the same wall. The cap travels with the credentials, not the builder.

Which model does mk0r default to, and how does that affect rate limits?

Haiku, set as the default model on src/lib/acp-chat-store.ts line 189 and re-applied to the session on src/core/e2b.ts line 1275. Haiku is the cheapest and fastest Claude model, which means a single prompt consumes far fewer tokens than the same prompt routed to Sonnet or Opus. On a fixed weekly cap, that translates to roughly five to ten times more iterations before you hit a soft warning. Vibe coding tools that default to Sonnet for code quality reasons burn through the cap proportionally faster. Neither default is wrong; they just trade quality for headroom.

What does the user actually see in mk0r when each state fires?

On api_retry: an amber-dot banner reading 'Retrying request, attempt N/M' plus the error type, HTTP status, and the next-retry delay in seconds. On rate_limit with status allowed_warning: a non-blocking utilization indicator. On rate_limit with status rejected, or on credit_exhausted (HTTP 429 or 402 with billing_error / rate_limit error type): a persistent error overlay reading 'Usage limit reached', the raw SDK message (which contains the human-readable reset time), and Retry/Dismiss buttons. The classifier that decides which overlay to render is the isStructuredCredit check at the top of src/app/api/chat/route.ts, which combines errorType, httpStatus, and a regex fallback for stock-ACP installs that don't forward the typed signal.

Is the rate_limit_event behavior likely to change in future SDK versions?

Probably. v0.29 of the Claude Agent SDK started emitting nine new event types including system/api_retry and rate_limit_event (per the Claude Code April 2026 update notes). The trajectory is toward more signals, not fewer, and the stock ACP wrapper will eventually catch up. Until then, any AI builder that wants to show structured rate-limit info has to either patch the wrapper, intercept the iterator, or read the lastApiError on the prompt response. The third path is what mk0r uses for terminal errors; the first two are how it surfaces in-flight signals.