Claude API in Practice: Streaming, Tool Use, and Structured Output

AI Navigate Original / 3/24/2026

💬 OpinionDeveloper Stack & Infrastructure
共有:

Key Points

  • SSE streaming can improve perceived speed for chat and long-form generation, but for saving and business processing it’s safest to use the final, confirmed response
  • Tool Use is convenient, but in real operations it’s important to include a check step for tool splitting, strict argument constraints via JSON Schema, and handling of side effects
  • For stable JSON structured output, combine “return only JSON” instructions with a low temperature and validation on the application side
  • Prompt Caching is effective for reducing cost when you have long, shared prompts, and Batch API is well-suited for large-scale asynchronous processing
  • A practical rollout order is: JSON structured output → Streaming → Tool Use → Caching/Batch, which balances implementation effort and impact

Master “production-like” features with the Claude API

The Claude API is not limited to generating text. It also provides practical, work-oriented features such as streaming responses, Tool Use (function calling), structured JSON output, Prompt Caching, and Batch API. As of 2025, rather than simply API-izing prompts you tested in a chat UI, it’s important to design the implementation around response format, speed, cost, and reproducibility.

In this article, using code examples centered on Python, we’ll整理 how to integrate these features so they’re truly easy to use in practice. SDK differences may change over time, so be sure to read the official documentation alongside this article when you start integrating.

SSE Streaming: Start by creating a “not to make users wait” experience

For long answers or summarization generation, showing output incrementally via Server-Sent Events (SSE) often improves UX compared to waiting for the full completion. This is especially effective for chat, review assistance, and meeting minutes generation.

Concept for basic implementation

  • Receive the Claude API stream on the backend
  • Relay it directly to the frontend, or format it before sending
  • Handle not only text fragments, but also completion and error events
from anthropic import Anthropic

client = Anthropic(api_key="YOUR_API_KEY")

with client.messages.stream(
    model="claude-sonnet-4-5",
    max_tokens=1200,
    temperature=0.2,
    messages=[
        {"role": "user", "content": "Tell me the key points of implementing SSE streaming in 3 items"}
    ]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

    final_message = stream.get_final_message()

A key point during implementation is to not save the partial output as-is. Intermediate output may include rewrites or unfinished sentences, so for DB storage and audit logs it’s safer to use the final, confirmed text.

Cases where streaming is a good fit / not a good fit

CaseSuitabilityReason
Chat UISuitablePerceived speed improves significantly
Long-form summarizationSuitableUsers can start reading partway through
JSON structured outputSome cautionIntermediate fragments are likely to form incomplete JSON
Batch aggregationNot suitableThe benefit of incremental display is diminished

Especially for processes that require strict structured output, a practical separation is: stream the output for the UI only, and perform business processing only after the final response has been confirmed.

Tool Use: Don’t let the model “do too much by itself”

Sign up to read the full article

Create a free account to access the full content of our original articles.