Name: CrabLLM
Author: CrabTalk

Question 1

What's the overhead?

Accepted Answer

0.26ms P50 at 5,000 RPS. Rust with Tokio — no GC pauses, no interpreter. The gateway is not the bottleneck.

Question 2

How is this different from LiteLLM?

Accepted Answer

Same feature set, different runtime. CrabLLM is a single static binary with sub-millisecond latency. LiteLLM is Python. Both do provider translation, routing, auth, and extensions.

Question 3

Does it support streaming?

Accepted Answer

Yes. SSE streams are proxied without buffering across all providers — OpenAI, Anthropic, Gemini, Azure, Bedrock, and Ollama.

Question 4

What about CrabTalk?

Accepted Answer

CrabLLM powers CrabTalk's provider system. CrabTalk is an agent daemon. CrabLLM is the LLM gateway underneath it. They're separate products — use either independently.

Question 5

How much does it cost?

Accepted Answer

Free and open source. You pay your LLM provider directly. No markup, no metering, no vendor lock-in.

Gateway	P50	P99
CrabLLM	0.26ms	0.54ms
Bifrost	0.61ms	1.26ms
LiteLLM	159ms	227ms

One API. Every model.

What you get

Provider translation

Routing & fallback

Streaming first-class

Virtual keys & auth

Caching & rate limits

Budget enforcement

0.26ms P50 latency

How it works

Frequently asked questions