How to build a Claude chatbot with streaming responses in under 50 lines of Node.js
Streaming is one of those features that sounds complicated but completely transforms the user experience. Instead of staring at a spinner for 3-5 seconds, users see the response appear word by word — like watching someone type.
Here's how to do it with Claude in Node.js. The whole thing is under 50 lines.
The full code
const Anthropic = require('@anthropic-ai/sdk');
const readline = require('readline');
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const history = [];
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout
});
async function chat(userMessage) {
history.push({ role: 'user', content: userMessage });
process.stdout.write('
Claude: ');
let fullResponse = '';
const stream = await client.messages.stream({
model: 'claude-opus-4-5',
max_tokens: 1024,
messages: history
});
for await (const chunk of stream) {
if (chunk.type === 'content_block_delta' && chunk.delta.type === 'text_delta') {
process.stdout.write(chunk.delta.text);
fullResponse += chunk.delta.text;
}
}
console.log('
');
history.push({ role: 'assistant', content: fullResponse });
}
function prompt() {
rl.question('You: ', async (input) => {
if (input.toLowerCase() === 'quit') return rl.close();
await chat(input);
prompt();
});
}
prompt();
That's it. Run it with:
npm install @anthropic-ai/sdk
ANTHROPIC_API_KEY=your_key node chatbot.js
How streaming actually works
Claude's API uses Server-Sent Events (SSE). When you call messages.stream(), the connection stays open and the server pushes chunks as they're generated.
Each chunk has a type. The ones you care about:
| Type | What it means |
|---|---|
content_block_start |
Claude is about to start a text block |
content_block_delta |
Here's the next piece of text |
content_block_stop |
That block is done |
message_stop |
Whole response is done |
You only need content_block_delta with text_delta — that's where the actual words are.
Adding it to an Express server
The terminal version is fine for testing. Here's how to expose it as an HTTP endpoint:
const express = require('express');
const Anthropic = require('@anthropic-ai/sdk');
const app = express();
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
app.use(express.json());
app.post('/chat', async (req, res) => {
const { messages } = req.body;
// Set SSE headers
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
const stream = await client.messages.stream({
model: 'claude-opus-4-5',
max_tokens: 1024,
messages
});
for await (const chunk of stream) {
if (chunk.type === 'content_block_delta' && chunk.delta.type === 'text_delta') {
res.write(`data: ${JSON.stringify({ text: chunk.delta.text })}
`);
}
}
res.write('data: [DONE]
');
res.end();
});
app.listen(3000);
Consuming the stream on the frontend
async function sendMessage(messages) {
const response = await fetch('/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ messages })
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('
');
buffer = lines.pop(); // Keep incomplete line in buffer
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') return;
const { text } = JSON.parse(data);
// Append text to your UI element
document.getElementById('response').textContent += text;
}
}
}
}
Common mistakes
1. Not handling the buffer correctly
SSE chunks don't always align with JSON boundaries. The buffer pattern above handles cases where a chunk gets split mid-JSON.
2. Forgetting to end the response
If you don't call res.end() after the stream finishes, the client will hang waiting for more data.
3. Not setting the right headers
Without Content-Type: text/event-stream, the browser won't treat it as SSE — it'll wait for the full response before rendering anything.
4. Streaming to multiple users without isolation
Each request needs its own stream instance. Don't share stream state between requests.
The cost question
Streaming doesn't change the token count — you pay for the same tokens whether you stream or not. What it changes is perceived latency: a 2-second response that streams feels 3x faster than a 2-second response that appears all at once.
If you're building on a per-token API, that latency improvement is free. If you're building on a flat-rate API (like SimplyLouie at $2/month), there's no cost math to worry about at all — stream everything, always.
What to build next
Once streaming works, the natural next steps are:
- Typing indicators: Show "Claude is thinking..." before the first chunk arrives
- Stop button: Let users interrupt a long response mid-stream
- Token counting: Show a live token counter as the response streams
-
Conversation export: Save the full streamed response to history after
message_stop
The full code above works as-is. Copy it, set your API key, run it. You'll have a streaming Claude chatbot in under 2 minutes.
Building on Claude's API? SimplyLouie offers flat-rate API access at $2/month — no token counting, no surprise bills.




