LLM API Gateway
One API. Every model.
cargo install crabllmRoute requests to OpenAI, Anthropic, Gemini, Azure, Bedrock, or Ollama. Sub-millisecond overhead. Single binary. No runtime.
What you get
Provider translation
Send OpenAI format. CrabLLM translates to Anthropic, Gemini, Bedrock, and Azure automatically.
Routing & fallback
Weighted random selection across providers. Exponential backoff retry. Automatic failover.
Streaming first-class
SSE proxied without buffering. Per-chunk extension hooks. Keep-alive pings.
Virtual keys & auth
Per-key model access control. Rate limiting, usage tracking, and budget enforcement.
Caching & rate limits
SHA-256 response cache. Per-key RPM and TPM limits. Sliding window enforcement.
Budget enforcement
Per-key spend limits in USD. Automatic cost tracking from token usage and pricing config.
0.26ms P50 latency
Gateway overhead at 5,000 concurrent requests per second. Lower is better.
| Gateway | P50 | P99 |
|---|---|---|
| CrabLLM | 0.26ms | 0.54ms |
| Bifrost | 0.61ms | 1.26ms |
| LiteLLM | 159ms | 227ms |
How it works
1. Configure
listen = "0.0.0.0:8080"
[providers.openai]
kind = "openai"
api_key = "${OPENAI_API_KEY}"
models = ["gpt-4o"]
[providers.anthropic]
kind = "anthropic"
api_key = "${ANTHROPIC_API_KEY}"
models = ["claude-sonnet-4-20250514"]2. Run
crabllm --config crabllm.toml3. Send requests
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "claude-sonnet-4-20250514",
"messages": [{"role": "user", "content": "Hello!"}]}'Same OpenAI format, any provider. CrabLLM translates automatically.