From LiteLLM self-host to ClaudeAPI.cheap.
Stop maintaining the gateway.
LiteLLM is a great library — the open-source code is solid. The cost shows up elsewhere: someone has to babysit the YAML, ship security patches, watch the container, configure new model IDs every release. We do that work and pass on a 70-80% Claude discount on top. Same OpenAI SDK shape your code already speaks.
The real cost of self-host isn't the API bill
Annual TCO for one engineer-team running LiteLLM in production, conservative estimates.
Even at low utilization, the engineer-hour line dominates. Above ~20M Opus tokens/month, the API bill discount alone covers the Pro upgrade in a single afternoon. Above 100M, the gap pays for an engineer-quarter.
Real pricing math
Per 1M tokens, USD. LiteLLM self-host pays Anthropic list directly; we're 70-80% below.
Migration in 3 steps
Get an API key (30 seconds)
Sign up at claudeapi.cheap/signup, generate an sk-cc-... key.
Point your existing client at us instead of LiteLLM
Your application already uses an OpenAI-shaped client. The base URL was pointing at your LiteLLM container — point it at us:
After cutover, the LiteLLM container becomes dead infrastructure — schedule its decom for the same sprint.
Decommission the gateway
Stop the LiteLLM container, archive the YAML config to your repo's /docs/historical/, remove the model-alias TypeScript shim, and delete the on-call alert rule that watched the gateway port. The 2-4h saved on every Anthropic release starts immediately.
If your team also runs alert rules for budget caps, swap them onto our dashboard balance endpoint — see /docs for the alert webhook format.
FAQ
My team relied on LiteLLM's budget tracking. Do you have that?
Yes. Every account has a real-time balance, daily usage breakdown by model, and per-request audit logs at /dashboard/usage. The big difference vs LiteLLM self-host: you can't blow past the balance because top-ups are prepaid crypto. No card-on-file means no surprise overage.
What about multi-tenant key management (we issued team keys via LiteLLM)?
Each account can mint multiple sk-cc-... keys from the dashboard — one per service or developer. Revoke any key independently. Usage per key is logged. For shared-balance multi-tenant patterns (departments, contractors), email support@claudeapi.cheap — we can scope keys with per-key spend caps.
Does fallback / failover routing still work?
LiteLLM's fallback chain (e.g. Claude → GPT → local) was a self-host feature. We provide the multi-vendor surface — same key works for Claude, GPT-5, and Gemini 3 endpoints. The fallback decision logic stays in your application code. Most teams find that's cleaner because it's explicit.
How does latency compare to a self-hosted gateway?
We deploy in three Vercel regions (sin1 Singapore, iad1 US-East, cdg1 Paris) plus Cloudflare edge in front for marketing pages. Proxy adds 50-150ms over Anthropic Direct depending on your origin. A self-hosted LiteLLM in the same VPC may be 10-30ms faster, but you're paying the engineer-hours to keep it that way. For agent / chatbot workloads bound by model latency (seconds), our overhead is invisible.
Can I run both stacks side-by-side during cutover?
Yes — instantiate two clients, shadow 10% of traffic to us first, compare logs for 48 hours, then ramp. Most LiteLLM teams cut over in a single sprint because the SDK surface is identical and the per-request audit log makes parity verification trivial.
No card, crypto only. Basic free forever; $19 lifetime Pro. Read our SLA →