Streaming, Tool Use, and Structured Output

AI Navigate Original / 3/24/2026

💬 OpinionDeveloper Stack & Infrastructure
共有:

Key Points

  • SSE streaming improves perceived speed for chat/long-text UIs; use final confirmed text (not fragments) for DB/audit, and keep strict structured output out of streaming.
  • Tool Use: control on the app side which tool and how far it's allowed; add confirmation for side effects, strict JSON Schema args, and separate search/reference/update tools.
  • JSON structured output: fix the schema, demand "JSON only," keep temperature 0–0.2, validate app-side, and separate explanation vs. structured APIs.
  • Prompt Caching cuts cost for long fixed prompts; Batch API suits large async jobs. Adopt in order: structured output → streaming → tool use → caching/batch.

Mastering "Production-Like" Features With the Claude API

The Claude API has not only text generation but also practical features such as streaming responses, Tool Use (function calling), structured JSON output, Prompt Caching, and the Batch API. As of 2025, rather than turning a prompt tried in a chat UI into an API as-is, an implementation that designs response format, speed, cost, and reproducibility is important.

In this article, with Python-centered code examples, we organize "how to integrate it so it is actually easy to use." Since fine differences in the SDK are updated, always read the official documentation together when adopting.

SSE Streaming: First, Create an Experience That "Doesn't Make Users Wait"

For longish answers and summary generation, displaying progressively with Server-Sent Events (SSE) gives better UX than waiting for the full text to complete. It is especially effective for chat, review support, and minutes generation.

The idea of a basic implementation

  • Receive the Claude API stream on the backend
  • Relay to the frontend as-is, or format and send
  • Handle not just fragment text but also completion events and error events
from anthropic import Anthropic

client = Anthropic(api_key="YOUR_API_KEY")

with client.messages.stream(
    model="claude-sonnet-4-5",
    max_tokens=1200,
    temperature=0.2,
    messages=[
        {"role": "user", "content": "Tell me 3 key points for implementing SSE streaming"}
    ]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

    final_message = stream.get_final_message()

The point at implementation time is not to store fragments as-is. Since intermediate output mixes in restatements and unfinished sentences, it is safer to use the final confirmed text for DB storage and audit logs.

Cases streaming suits and doesn't suit

CaseSuitabilityReason
Chat UISuitsPerceived speed improves greatly
Long-text summarizationSuitsUsers can start reading partway
JSON structured outputSlight cautionIntermediate fragments tend to be incomplete JSON
Batch aggregationDoesn't suitThe merit of progressive display is thin

Especially for processing that uses structured output strictly, the practical separation is: streaming display is UI-only, and business processing is done after the final response is confirmed.

Tool Use: "Don't Leave External Processing Too Much to the Model"

Sign up to read the full article

Create a free account to access the full content of our original articles.

Streaming, Tool Use, and Structured Output | AI Navigate