Skip to content
All posts
·3 min readstreamingapipythontutorial

Streaming vs Non-Streaming API: When to Use Which

Learn the difference between streaming and non-streaming Claude API calls, when to use each mode, and see Python code examples for both.

What's the Difference?

When you call the Claude API, you have two options for receiving responses:

  • Non-streaming: You send a request and wait. The API processes the entire response, then returns it all at once as a single JSON object.
  • Streaming: You send a request and immediately start receiving the response in small chunks (tokens) as they're generated, delivered via Server-Sent Events (SSE).
  • Both modes produce identical output. The difference is entirely about how and when you receive that output.

    When to Use Non-Streaming

    Non-streaming is simpler to implement and ideal when you don't need real-time output:

  • Batch processing — analyzing hundreds of documents where you collect results afterward
  • Backend pipelines — extracting data, classifying text, or generating summaries in automated workflows
  • Simple integrations — scripts and tools where you just need the final answer
  • Testing and prototyping — easier to debug with a single complete response
  • The tradeoff is perceived latency. For long responses, the user sees nothing until the entire generation finishes, which can take several seconds.

    Non-Streaming Python Example

    import anthropic
    
    client = anthropic.Anthropic(
        api_key="your-api-key",
        base_url="https://api.claudeapi.cheap"
    )
    
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Explain recursion in 3 sentences."}]
    )
    
    print(response.content[0].text)

    You call messages.create(), wait for the response, and read the full text. Clean and straightforward.

    When to Use Streaming

    Streaming is the better choice when responsiveness matters:

  • Chatbots and conversational UIs — users see words appear in real time, just like ChatGPT
  • Long-form generation — articles, code, and reports that take several seconds to complete
  • Live dashboards — showing AI-generated insights as they're produced
  • Time-to-first-token matters — streaming starts delivering content in milliseconds instead of seconds
  • Streaming dramatically improves perceived performance. Even though total generation time is the same, users feel like the response is faster because they see output immediately.

    Streaming Python Example

    import anthropic
    
    client = anthropic.Anthropic(
        api_key="your-api-key",
        base_url="https://api.claudeapi.cheap"
    )
    
    with client.messages.stream(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Explain recursion in 3 sentences."}]
    ) as stream:
        for text in stream.text_stream:
            print(text, end="", flush=True)

    The messages.stream() context manager yields text chunks as they arrive. Each chunk is printed immediately, giving a typewriter effect.

    Key Differences at a Glance

  • Latency: Non-streaming waits for full completion. Streaming delivers the first token in ~200ms.
  • Complexity: Non-streaming is a single request/response. Streaming requires handling an event stream.
  • Error handling: Non-streaming errors come back in one response. Streaming errors can occur mid-stream, so you need to handle partial failures.
  • Cost: Both modes cost exactly the same per token. No pricing difference.
  • Using Both with claudeapi.cheap

    Both streaming and non-streaming work identically through our proxy. Just set your base_url to https://api.claudeapi.cheap and everything works — no code changes beyond the URL.

    This applies to the Python SDK, Node.js SDK, and direct HTTP requests. All streaming events are forwarded in real time with minimal added latency.

    Our Recommendation

    Use streaming for anything user-facing where people are waiting for a response. Use non-streaming for background tasks where you just need the final result. Many production applications use both — streaming in the chat UI and non-streaming in the data pipeline.

    Get your API key →