Is this the real Claude API?

Yes. Your requests are processed by the same Claude models (Opus 4.7, Sonnet 4.6, Haiku 4.5) with the same context windows and capabilities. The only difference is the price.

How do I switch from the official API?

Just change the base URL and API key — it's a one-line change in your code. Works with Claude Code too; full setup steps are in our docs.

What payment methods do you accept?

We accept cryptocurrency — USDT (TRC20/ERC20), BTC, ETH, and 100+ other coins via Oxapay. Credits never expire.

Are there rate limits?

There are no fixed per-account caps. Throughput depends on system load, upstream provider availability, and the model in use — newer models often have tighter caps than older ones.

Do you store my prompts or data?

No. We don't log, store, or train on your API requests. Zero data retention policy on request content.

24/7 support via email at support@claudeapi.cheap. Pro users get priority response.

All posts

April 8, 2026·3 min readstreamingapipythontutorial

Streaming vs Non-Streaming API: When to Use Which

Learn the difference between streaming and non-streaming Claude API calls, when to use each mode, and see Python code examples for both.

What's the Difference?

When you call the Claude API, you have two options for receiving responses:

Non-streaming: You send a request and wait. The API processes the entire response, then returns it all at once as a single JSON object.

Streaming: You send a request and immediately start receiving the response in small chunks (tokens) as they're generated, delivered via Server-Sent Events (SSE).

Both modes produce identical output. The difference is entirely about how and when you receive that output.

When to Use Non-Streaming

Non-streaming is simpler to implement and ideal when you don't need real-time output:

Batch processing — analyzing hundreds of documents where you collect results afterward

Backend pipelines — extracting data, classifying text, or generating summaries in automated workflows

Simple integrations — scripts and tools where you just need the final answer

Testing and prototyping — easier to debug with a single complete response

The tradeoff is perceived latency. For long responses, the user sees nothing until the entire generation finishes, which can take several seconds.

Non-Streaming Python Example

import anthropic

client = anthropic.Anthropic(
    api_key="your-api-key",
    base_url="https://claudeapi.cheap/api/proxy"
)

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain recursion in 3 sentences."}]
)

print(response.content[0].text)

You call messages.create(), wait for the response, and read the full text. Clean and straightforward.

When to Use Streaming

Streaming is the better choice when responsiveness matters:

Chatbots and conversational UIs — users see words appear in real time, just like ChatGPT

Long-form generation — articles, code, and reports that take several seconds to complete

Live dashboards — showing AI-generated insights as they're produced

Time-to-first-token matters — streaming starts delivering content in milliseconds instead of seconds

Streaming dramatically improves perceived performance. Even though total generation time is the same, users feel like the response is faster because they see output immediately.

Streaming Python Example

import anthropic

client = anthropic.Anthropic(
    api_key="your-api-key",
    base_url="https://claudeapi.cheap/api/proxy"
)

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain recursion in 3 sentences."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

The messages.stream() context manager yields text chunks as they arrive. Each chunk is printed immediately, giving a typewriter effect.

Key Differences at a Glance

Latency: Non-streaming waits for full completion. Streaming delivers the first token in ~200ms.

Complexity: Non-streaming is a single request/response. Streaming requires handling an event stream.

Error handling: Non-streaming errors come back in one response. Streaming errors can occur mid-stream, so you need to handle partial failures.

Cost: Both modes cost exactly the same per token. No pricing difference.

Using Both with claudeapi.cheap

Both streaming and non-streaming work identically through our proxy. Just set your base_url to https://claudeapi.cheap/api/proxy and everything works — no code changes beyond the URL.

This applies to the Python SDK, Node.js SDK, and direct HTTP requests. All streaming events are forwarded in real time with minimal added latency.

Our Recommendation

Use streaming for anything user-facing where people are waiting for a response. Use non-streaming for background tasks where you just need the final result. Many production applications use both — streaming in the chat UI and non-streaming in the data pipeline.

Get your API key →

For detailed setup instructions and code examples, check our API documentation.

All migration guides →All tool integrations →vs Anthropic Direct →SLA & refund policy →