Building a voice-activated AI assistant with Node.js and Claude API

Dev.to / 4/16/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

共有:

Key Points

The article walks through building a voice-activated AI assistant by combining the Web Speech API in the browser with a Node.js/Express backend.
It uses Claude via the SimplyLouie developer API (listed as a flat $10/month) to interpret user speech and generate contextual responses.
The assistant maintains conversation context using in-memory history, with a note that this can be upgraded to Redis for persistence and scalability.
The implementation is presented step-by-step, starting with capturing and transcribing voice input on the frontend before sending it to the backend for AI processing.
Overall, it provides a practical reference architecture for a low-cost, context-aware voice agent using readily available web and API components.

Building a voice-activated AI assistant with Node.js and Claude API

I wanted to build something fun: a voice assistant that actually understands context, remembers what you said earlier in the conversation, and costs less than a coffee a month to run.

Here's how I built it using the Web Speech API + Node.js + Claude API access via SimplyLouie.

The stack

Frontend: Vanilla JS + Web Speech API (built into Chrome/Edge — no library needed)
Backend: Node.js Express
AI: Claude via SimplyLouie's developer API at $10/month flat
Storage: In-memory conversation history (upgradeable to Redis)

Step 1: The frontend — capture voice input

<!DOCTYPE html>
<html>
<head>
  <title>Voice AI</title>
</head>
<body>
  <button id="startBtn">🎤 Hold to talk</button>
  <div id="transcript"></div>
  <div id="response"></div>

  <script>
    const btn = document.getElementById('startBtn');
    const transcriptEl = document.getElementById('transcript');
    const responseEl = document.getElementById('response');

    const recognition = new webkitSpeechRecognition();
    recognition.continuous = false;
    recognition.interimResults = false;
    recognition.lang = 'en-US';

    btn.addEventListener('mousedown', () => recognition.start());
    btn.addEventListener('mouseup', () => recognition.stop());

    recognition.onresult = async (event) => {
      const transcript = event.results[0][0].transcript;
      transcriptEl.textContent = `You said: ${transcript}`;

      const reply = await fetch('/api/chat', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ message: transcript })
      }).then(r => r.json());

      responseEl.textContent = `AI: ${reply.response}`;

      // Speak the response back
      const utterance = new SpeechSynthesisUtterance(reply.response);
      window.speechSynthesis.speak(utterance);
    };
  </script>
</body>
</html>

Step 2: The backend — handle conversation context

const express = require('express');
const app = express();
app.use(express.json());

// Simple in-memory conversation store (keyed by session)
const conversations = new Map();

app.post('/api/chat', async (req, res) => {
  const sessionId = req.headers['x-session-id'] || 'default';
  const { message } = req.body;

  // Get or create conversation history
  if (!conversations.has(sessionId)) {
    conversations.set(sessionId, []);
  }
  const history = conversations.get(sessionId);

  // Add user message
  history.push({ role: 'user', content: message });

  // Keep last 10 exchanges to stay within context limits
  const recentHistory = history.slice(-20);

  try {
    const response = await fetch('https://simplylouie.com/api/chat', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'Authorization': `Bearer ${process.env.LOUIE_API_KEY}`
      },
      body: JSON.stringify({
        messages: recentHistory,
        system: 'You are a helpful voice assistant. Keep responses concise — under 2 sentences when possible, since they will be read aloud.'
      })
    });

    const data = await response.json();
    const aiReply = data.response || data.content;

    // Store assistant response
    history.push({ role: 'assistant', content: aiReply });

    res.json({ response: aiReply });
  } catch (err) {
    res.status(500).json({ error: 'AI unavailable' });
  }
});

app.listen(3000, () => console.log('Voice AI running on :3000'));

Step 3: Add session tracking so it remembers the conversation

The frontend needs to send a consistent session ID:

// Add to frontend — generate once per page load
const sessionId = Math.random().toString(36).substr(2, 9);

// Update the fetch call:
const reply = await fetch('/api/chat', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'x-session-id': sessionId  // <-- add this
  },
  body: JSON.stringify({ message: transcript })
}).then(r => r.json());

Now it remembers what you talked about earlier in the session.

What it can do

Once it's running:

You: "What's the capital of France?"
AI: "Paris."

You: "What's the population there?"
AI: "Paris has about 2.1 million people in the city proper."

Note how the second question works — it understands "there" means Paris because of the conversation history.

The cost math

I ran this for a week with about 200 voice interactions:

Claude API via SimplyLouie: $10/month flat (developer tier)
Hosting (Railway): $5/month
Web Speech API: free (browser built-in)
Total: $15/month

For comparison, building this with OpenAI's Whisper for speech-to-text + GPT-4 API would run $40-60/month at the same volume.

Optional: add wake word detection

If you want it to always listen (like Alexa):

// Continuously restart recognition
recognition.onend = () => {
  if (isListening) recognition.start();
};

recognition.onresult = async (event) => {
  const transcript = event.results[0][0].transcript.toLowerCase();

  // Only respond if wake word detected
  if (!transcript.includes('hey louie')) return;

  const actualMessage = transcript.replace('hey louie', '').trim();
  // ... rest of handler
};

let isListening = true;
recognition.start();

Get started

The developer API that powers this: simplylouie.com/developers

$10/month flat rate. No per-token billing surprises. You hit the API, it works.

Full repo for this project is in the comments — drop your questions there too.

Black Hat USA

AI Business

Black Hat Asia

AI Business

I built a trading intelligence MCP server in 2 days — here's how

Dev.to

Voice-Controlled AI Agent Using Whisper and Local LLM

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Building a voice-activated AI assistant with Node.js and Claude API

Key Points

Building a voice-activated AI assistant with Node.js and Claude API

The stack

Step 1: The frontend — capture voice input

Step 2: The backend — handle conversation context

Step 3: Add session tracking so it remembers the conversation

What it can do

The cost math

Optional: add wake word detection

Get started

Related Articles

Black Hat USA

Black Hat Asia

I built a trading intelligence MCP server in 2 days — here's how

Voice-Controlled AI Agent Using Whisper and Local LLM

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer