ClaudeAPI.cheap vs Anthropic Direct.
Honest tradeoffs.
Most comparison pages cherry-pick rows where the writer's product wins. This one includes every row that matters — including the 3 dimensions where Anthropic Direct genuinely wins. You should know both before you choose.
Pick ClaudeAPI.cheap if:price matters meaningfully (you spend >$50/mo on Claude), you want multi-vendor with one key, you want a hard balance ceiling instead of card auto-charge, or you cannot pay by card (international, no-card-region, prepaid only).
Stay on Anthropic Direct if: you are at enterprise scale with a direct contract, you require zero-proxy-hop latency, you need 100% guaranteed prompt-caching passthrough, or your compliance regime requires first-party vendor relationship.
The headline number, visualised — output token price by tier
Live data · lib/config.tsSame Claude weights, same context window. The only thing the proxy changes is the $/1M you pay. Read on for the 18 dimensions beyond price.
Where does ClaudeAPI.cheap differ from Anthropic Direct?
18 rows. We mark Anthropic Direct as winner where it genuinely wins — caching reliability, zero latency, vendor maturity. The rest of the column is where the proxy model pays off.
What does the difference mean for a real Claude workload?
Example: 100M Opus input tokens / month — typical for a mid-size SaaS using Claude for support automation, content drafts, or agent backends.
At this volume the Pro plan pays for itself in the first hour. The break-even threshold for Pro vs Basic is around 5M Opus input tokens / month — below that, Basic at 50% off is the right pick.
When does Anthropic Direct beat a proxy?
Honesty wins more trust than spin. Three rows above mark Direct as winner — here is why and when those rows decide the choice.
Anthropic Direct gives you 100% cache metadata visibility on every response. We pass it through when our upstream forwards it, which in 2026 is roughly 95% consistent. If your workload depends on knowing the cache read/write ratio per-request for billing or alerting, Direct is the cleaner answer. If you are caching-aware at the application layer and only care about the actual cost (which is reduced regardless), this is invisible.
Direct adds 0ms because you are the source. We add 50-150ms depending on your region (Singapore / US East / Paris). For workloads bound by model generation time (seconds), this is invisible. For real-time interactive paths where every 100ms matters, Direct wins.
Anthropic is a well-capitalized company at multi-billion-dollar valuation. We are a small independent operator. Probability that we disappear in a 12-month window is non-zero in a way Anthropic's is not. Our mitigation: 7-day refund + multi-vendor (Claude / GPT-5 / Gemini one key) so even in tail-risk shutdown you keep working on the other two vendors. For mission-critical production with compliance-sensitive vendor requirements, Direct is the right answer.
Read next
No card, crypto only · $19 lifetime Pro at 80% off · Read our SLA →