Skip to content
All migrations
Migration guide

From LiteLLM self-host to ClaudeAPI.cheap.
Stop maintaining the gateway.

LiteLLM is a great library — the open-source code is solid. The cost shows up elsewhere: someone has to babysit the YAML, ship security patches, watch the container, configure new model IDs every release. We do that work and pass on a 70-80% Claude discount on top. Same OpenAI SDK shape your code already speaks.

99.5% uptime
Pay with crypto
Balance never expires
Named human support
No quantization

The real cost of self-host isn't the API bill

Annual TCO for one engineer-team running LiteLLM in production, conservative estimates.

Cost line
LiteLLM self-host
ClaudeAPI.cheap
Engineer hours / month maintaining gateway
8-15h ($800-1,500)
0h ($0)
Container infra (small VPS / ECS task)
$25-100/mo
Included
On-call rotation cost when proxy 500s
Real $
Included
Claude API bill (assume 50M Opus in/mo)
$250 (list)
$50 (Pro, 80% off)
Model ID upgrade work each Anthropic release
2-4h / release
$0, we ship same week
Status / SLA stack you build
Self-built
status.claudeapi.cheap included

Even at low utilization, the engineer-hour line dominates. Above ~20M Opus tokens/month, the API bill discount alone covers the Pro upgrade in a single afternoon. Above 100M, the gap pays for an engineer-quarter.

Real pricing math

Per 1M tokens, USD. LiteLLM self-host pays Anthropic list directly; we're 70-80% below.

Model
List (your current)
ClaudeAPI.cheap Pro
You save
Claude Opus 4.7
in / out per 1M tokens
$5.00 / $25.00
$1.00 / $5.00
80%
Claude Sonnet 4.6
in / out per 1M tokens
$3.00 / $15.00
$0.60 / $3.00
80%
Claude Haiku 4.5
in / out per 1M tokens
$1.00 / $5.00
$0.20 / $1.00
80%

Migration in 3 steps

1

Get an API key (30 seconds)

Sign up at claudeapi.cheap/signup, generate an sk-cc-... key.

2

Point your existing client at us instead of LiteLLM

Your application already uses an OpenAI-shaped client. The base URL was pointing at your LiteLLM container — point it at us:

Before (LiteLLM container)
python
from openai import OpenAI client = OpenAI( base_url="http://litellm.internal:4000/v1", # your container api_key="sk-litellm-master-key", ) resp = client.chat.completions.create( model="claude-3-opus", # litellm-mapped alias messages=[{"role": "user", "content": "Hello"}], )
After (ClaudeAPI.cheap)
python
from openai import OpenAI client = OpenAI( base_url="https://claudeapi.cheap/api/proxy/v1", api_key="sk-cc-your-claudeapi-cheap-key", ) resp = client.chat.completions.create( model="claude-opus-4-7", messages=[{"role": "user", "content": "Hello"}], )

After cutover, the LiteLLM container becomes dead infrastructure — schedule its decom for the same sprint.

3

Decommission the gateway

Stop the LiteLLM container, archive the YAML config to your repo's /docs/historical/, remove the model-alias TypeScript shim, and delete the on-call alert rule that watched the gateway port. The 2-4h saved on every Anthropic release starts immediately.

If your team also runs alert rules for budget caps, swap them onto our dashboard balance endpoint — see /docs for the alert webhook format.

FAQ

My team relied on LiteLLM's budget tracking. Do you have that?

Yes. Every account has a real-time balance, daily usage breakdown by model, and per-request audit logs at /dashboard/usage. The big difference vs LiteLLM self-host: you can't blow past the balance because top-ups are prepaid crypto. No card-on-file means no surprise overage.

What about multi-tenant key management (we issued team keys via LiteLLM)?

Each account can mint multiple sk-cc-... keys from the dashboard — one per service or developer. Revoke any key independently. Usage per key is logged. For shared-balance multi-tenant patterns (departments, contractors), email support@claudeapi.cheap — we can scope keys with per-key spend caps.

Does fallback / failover routing still work?

LiteLLM's fallback chain (e.g. Claude → GPT → local) was a self-host feature. We provide the multi-vendor surface — same key works for Claude, GPT-5, and Gemini 3 endpoints. The fallback decision logic stays in your application code. Most teams find that's cleaner because it's explicit.

How does latency compare to a self-hosted gateway?

We deploy in three Vercel regions (sin1 Singapore, iad1 US-East, cdg1 Paris) plus Cloudflare edge in front for marketing pages. Proxy adds 50-150ms over Anthropic Direct depending on your origin. A self-hosted LiteLLM in the same VPC may be 10-30ms faster, but you're paying the engineer-hours to keep it that way. For agent / chatbot workloads bound by model latency (seconds), our overhead is invisible.

Can I run both stacks side-by-side during cutover?

Yes — instantiate two clients, shadow 10% of traffic to us first, compare logs for 48 hours, then ramp. Most LiteLLM teams cut over in a single sprint because the SDK surface is identical and the per-request audit log makes parity verification trivial.

Stop maintaining — get a managed key

No card, crypto only. Basic free forever; $19 lifetime Pro. Read our SLA →