Streaming vs Non-Streaming API: When to Use Which
Learn the difference between streaming and non-streaming Claude API calls, when to use each mode, and see Python code examples for both.
What's the Difference?
When you call the Claude API, you have two options for receiving responses:
Both modes produce identical output. The difference is entirely about how and when you receive that output.
When to Use Non-Streaming
Non-streaming is simpler to implement and ideal when you don't need real-time output:
The tradeoff is perceived latency. For long responses, the user sees nothing until the entire generation finishes, which can take several seconds.
Non-Streaming Python Example
import anthropic
client = anthropic.Anthropic(
api_key="your-api-key",
base_url="https://api.claudeapi.cheap"
)
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Explain recursion in 3 sentences."}]
)
print(response.content[0].text)You call messages.create(), wait for the response, and read the full text. Clean and straightforward.
When to Use Streaming
Streaming is the better choice when responsiveness matters:
Streaming dramatically improves perceived performance. Even though total generation time is the same, users feel like the response is faster because they see output immediately.
Streaming Python Example
import anthropic
client = anthropic.Anthropic(
api_key="your-api-key",
base_url="https://api.claudeapi.cheap"
)
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Explain recursion in 3 sentences."}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)The messages.stream() context manager yields text chunks as they arrive. Each chunk is printed immediately, giving a typewriter effect.
Key Differences at a Glance
Using Both with claudeapi.cheap
Both streaming and non-streaming work identically through our proxy. Just set your base_url to https://api.claudeapi.cheap and everything works — no code changes beyond the URL.
This applies to the Python SDK, Node.js SDK, and direct HTTP requests. All streaming events are forwarded in real time with minimal added latency.
Our Recommendation
Use streaming for anything user-facing where people are waiting for a response. Use non-streaming for background tasks where you just need the final result. Many production applications use both — streaming in the chat UI and non-streaming in the data pipeline.