How to build a Claude chatbot with streaming responses in under 50 lines of Node.js

Dev.to / 4/21/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

共有:

Key Points

The article explains how streaming responses can dramatically improve chatbot UX by showing generated text incrementally instead of waiting for a full reply.
It provides a compact Node.js example that uses Anthropic’s SDK to build a Claude chatbot.
The sample maintains a simple conversation history array and appends each user message before sending it to the model.
It demonstrates real-time output by writing streamed tokens to stdout as they arrive.
The full implementation is presented as a single small script (under 50 lines), aiming for quick setup and understanding.

How to build a Claude chatbot with streaming responses in under 50 lines of Node.js

Streaming is one of those features that sounds complicated but completely transforms the user experience. Instead of staring at a spinner for 3-5 seconds, users see the response appear word by word — like watching someone type.

Here's how to do it with Claude in Node.js. The whole thing is under 50 lines.

The full code

const Anthropic = require('@anthropic-ai/sdk');
const readline = require('readline');

const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const history = [];

const rl = readline.createInterface({
  input: process.stdin,
  output: process.stdout
});

async function chat(userMessage) {
  history.push({ role: 'user', content: userMessage });

  process.stdout.write('
Claude: ');
  let fullResponse = '';

  const stream = await client.messages.stream({
    model: 'claude-opus-4-5',
    max_tokens: 1024,
    messages: history
  });

  for await (const chunk of stream) {
    if (chunk.type === 'content_block_delta' && chunk.delta.type === 'text_delta') {
      process.stdout.write(chunk.delta.text);
      fullResponse += chunk.delta.text;
    }
  }

  console.log('
');
  history.push({ role: 'assistant', content: fullResponse });
}

function prompt() {
  rl.question('You: ', async (input) => {
    if (input.toLowerCase() === 'quit') return rl.close();
    await chat(input);
    prompt();
  });
}

prompt();

That's it. Run it with:

npm install @anthropic-ai/sdk
ANTHROPIC_API_KEY=your_key node chatbot.js

How streaming actually works

Claude's API uses Server-Sent Events (SSE). When you call messages.stream(), the connection stays open and the server pushes chunks as they're generated.

Each chunk has a type. The ones you care about:

Type	What it means
`content_block_start`	Claude is about to start a text block
`content_block_delta`	Here's the next piece of text
`content_block_stop`	That block is done
`message_stop`	Whole response is done

You only need content_block_delta with text_delta — that's where the actual words are.

Adding it to an Express server

The terminal version is fine for testing. Here's how to expose it as an HTTP endpoint:

const express = require('express');
const Anthropic = require('@anthropic-ai/sdk');

const app = express();
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

app.use(express.json());

app.post('/chat', async (req, res) => {
  const { messages } = req.body;

  // Set SSE headers
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');

  const stream = await client.messages.stream({
    model: 'claude-opus-4-5',
    max_tokens: 1024,
    messages
  });

  for await (const chunk of stream) {
    if (chunk.type === 'content_block_delta' && chunk.delta.type === 'text_delta') {
      res.write(`data: ${JSON.stringify({ text: chunk.delta.text })}

`);
    }
  }

  res.write('data: [DONE]

');
  res.end();
});

app.listen(3000);

Consuming the stream on the frontend

async function sendMessage(messages) {
  const response = await fetch('/chat', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ messages })
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let buffer = '';

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    buffer += decoder.decode(value, { stream: true });
    const lines = buffer.split('
');
    buffer = lines.pop(); // Keep incomplete line in buffer

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = line.slice(6);
        if (data === '[DONE]') return;
        const { text } = JSON.parse(data);
        // Append text to your UI element
        document.getElementById('response').textContent += text;
      }
    }
  }
}

Common mistakes

1. Not handling the buffer correctly

SSE chunks don't always align with JSON boundaries. The buffer pattern above handles cases where a chunk gets split mid-JSON.

2. Forgetting to end the response

If you don't call res.end() after the stream finishes, the client will hang waiting for more data.

3. Not setting the right headers

Without Content-Type: text/event-stream, the browser won't treat it as SSE — it'll wait for the full response before rendering anything.

4. Streaming to multiple users without isolation

Each request needs its own stream instance. Don't share stream state between requests.

The cost question

Streaming doesn't change the token count — you pay for the same tokens whether you stream or not. What it changes is perceived latency: a 2-second response that streams feels 3x faster than a 2-second response that appears all at once.

If you're building on a per-token API, that latency improvement is free. If you're building on a flat-rate API (like SimplyLouie at $2/month), there's no cost math to worry about at all — stream everything, always.

What to build next

Once streaming works, the natural next steps are:

Typing indicators: Show "Claude is thinking..." before the first chunk arrives
Stop button: Let users interrupt a long response mid-stream
Token counting: Show a live token counter as the response streams
Conversation export: Save the full streamed response to history after message_stop

The full code above works as-is. Copy it, set your API key, run it. You'll have a streaming Claude chatbot in under 2 minutes.

Building on Claude's API? SimplyLouie offers flat-rate API access at $2/month — no token counting, no surprise bills.

Black Hat USA

AI Business

Capsule Security Emerges From Stealth With $7 Million in Funding

Dev.to

Agent Package Manager (APM): A DevOps Guide to Reproducible AI Agents

Dev.to

3 Things I Learned Benchmarking Claude, GPT-4o, and Gemini on Real Dev Work

Dev.to

Dify Now Supports IRIS as a Vector Store — Setup Guide

Dev.to

How to build a Claude chatbot with streaming responses in under 50 lines of Node.js

Key Points

How to build a Claude chatbot with streaming responses in under 50 lines of Node.js

The full code

How streaming actually works

Adding it to an Express server

Consuming the stream on the frontend

Common mistakes

The cost question

What to build next

Related Articles

Black Hat USA

Capsule Security Emerges From Stealth With $7 Million in Funding

Agent Package Manager (APM): A DevOps Guide to Reproducible AI Agents

3 Things I Learned Benchmarking Claude, GPT-4o, and Gemini on Real Dev Work

Dify Now Supports IRIS as a Vector Store — Setup Guide

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer