Building Dynamic Audio with Emotion & Pace: Gemini 3.1 Flash TTS, Angular & Firebase Cloud Functions [GDE]

Dev.to / 5/2/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • Google released the Gemini 3.1 Flash TTS Preview model in the Gemini API, Vertex AI, and Gemini AI Studio, adding an “Audio tags” feature to express human emotion, pace, and style.
  • The post demonstrates an application flow where an image is analyzed with Firebase AI Logic to produce recommendations and an “obscure fact,” which is then passed to a Firebase Cloud Function to generate speech via a Gemini TTS model.
  • The Cloud Function streams the generated audio back to an Angular frontend, where it’s converted into a Blob URL and played through an audio player.
  • The author migrates their Angular app to use the Gemini 3.1 Flash TTS Preview model and builds an Angular signal-based form to input scene, emotion, and pace, then uses the GenAI TypeScript SDK to request expressive voice generation.
  • The tutorial outlines a practical stack and setup steps using Angular, Node.js LTS, Firebase Remote Config, Firebase Local Emulator Suite for local testing, and Vertex AI for reliable enterprise access (given API availability limits in Hong Kong).

Google released the Gemini 3.1 Flash TTS Preview model for AI audio generation in the Gemini API, Gemini in Vertex AI, and Gemini AI Studio. This model introduces a new Audio tags feature to exhibit expressive human emotion, pace, and style.

This application explores Firebase AI Logic to analyze an uploaded image to generate recommendations, description, alternative tags, and an obscure fact. The obscure fact is sent to a Firebase Cloud Function to generate an audio using a Gemini TTS model. The Cloud Function returns the stream to an Angular application that converts it to a Blob URL object. An audio player sets the URL to the source that users can click the Play button to play the stream.

In this blog post, I migrate my application to use the Gemini 3.1 Flash TTS Preview model and create a signal form in Angular to input a scene, emotion, and pace. Then, the Angular application provides the form values and the obscure fact to the Firebase Cloud Function to generate an expressive voice using the GenAI TypeScript SDK.

Prerequisites

The technical stack of the project:

  • Angular 21: The latest version as of May 2026.
  • Node.js LTS: The LTS version as of May 2026.
  • Firebase Remote Config: To manage dynamic parameters.
  • Firebase Cloud Functions: To generate an expressive human voice when called by the frontend.
  • Firebase Local Emulator Suite: To test the functions locally at http://localhost:5001.
  • Gemini in Vertex AI: To generate videos and store them in Firebase Cloud Storage.

The public Google AI Studio API is restricted in my region (Hong Kong). However, Vertex AI (Google Cloud) offers enterprise access that works reliably here, so I chose Vertex AI for this demo.

npm i -g firebase-tools

Install firebase-tools globally using npm.

firebase logout
firebase login

Log out of Firebase and log in again to perform proper Firebase authentication.

firebase init

Execute firebase init and follow the prompts to set up Firebase Cloud Functions, the Firebase Local Emulator Suite, Firebase Cloud Storage, and Firebase Remote Config.

If you have an existing project or multiple projects, you can specify the project ID on the command line.

firebase init --project <PROJECT_ID>

In both cases, the Firebase CLI automatically installs the firebase-admin and firebase-functions dependencies.

After completing the setup steps, the Firebase tools generate the functions emulator, functions, a storage rules file, remote config templates, and configuration files such as .firebaserc and firebase.json.

  • Angular dependency
npm i firebase

The Angular application requires the firebase dependency to initialize a Firebase app, load remote config, and invoke the Firebase Cloud Functions to generate videos.

  • Firebase dependencies
npm i @cfworker/json-schema @google/genai @modelcontextprotocol/sdk

Install the above dependencies to access Gemini in Vertex AI. @google/genai depends on @cfworker/json-schema and @modelcontextprotocol/sdk. Without these, the Cloud Functions cannot start.

With our project configured, let's look at how the frontend and backend communicate.

Architecture

High-level architecture of obscure fact generation

A user uploads an image in an Angular application and prompts the Gemini 3.1 Flash Lite Preview model to generate a few recommendations for improving the image, a description, and alternative tags. The user also uses the same model and the Google Search tool to find an obscure fact related to the image.

High-level architecture of audio generation

A user inputs a scene, an emotion, and a pace in an experimental signal form. When a user clicks the generate audio button, the Angular application sends the form values and the obscure fact to the Firebase Cloud Function to generate an expressive voice using the GenAI TypeScript SDK and Gemini 3.1 Flash TTS Preview model.

Limitations of Gemini 3.1 Flash TTS Preview Model

Firebase Integration

1. Configure Environment Variables

Defining the environment variables in the Firebase project ensures the functions know the region of the Google Cloud project, the Firebase Cloud Function location, and the required TTS model.

.env.example

GOOGLE_CLOUD_LOCATION="global"
GOOGLE_FUNCTION_LOCATION="asia-east2"
GEMINI_TTS_MODEL_NAME="gemini-3.1-flash-tts-preview"
WHITELIST="http://localhost:4200"
REFERER="http://localhost:4200/"
Variable Description
GOOGLE_CLOUD_LOCATION The region of the Google Cloud project. I chose global so that the Firebase project has access to the newest Gemini 3.1 Flash TTS preview model.
GOOGLE_FUNCTION_LOCATION The region of the Firebase Cloud Functions. I chose asia-east2 because this is the region where I live.
WHITELIST Requests must come from http://localhost:4200.
REFERER Requests originate from http://localhost:4200/.

http://localhost:4200 is the host and port number of my local Angular application.

2. Validating Environment Variables

Before the Cloud Function proceeds with any AI calls, it is critical to ensure that all necessary environment variables are present. I implemented an AUDIO_CONFIG IIFE (Immediately Invoked Function Expression) to validate environment variables like the TTS model name, Google Cloud Project ID, and location.

import logger from "firebase-functions/logger";

export function validate(value: string | undefined, fieldName: string, missingKeys: string[]) {
    const err = `${fieldName} is missing.`;
    if (!value) {
        logger.error(err);
        missingKeys.push(fieldName);
        return "";
    }

    return value;
}
export const AUDIO_CONFIG = (() => {
    logger.info("AUDIO_CONFIG initialization: Loading environment variables and validating configuration...");

    const env = process.env;

    const missingKeys: string[] = [];
    const location = validate(env.GOOGLE_CLOUD_LOCATION, "Vertex Location", missingKeys);
    const model = validate(env.GEMINI_TTS_MODEL_NAME, "Gemini TTS Model Name", missingKeys);
    const project = validate(env.GCLOUD_PROJECT, "Google Cloud Project", missingKeys);

    if (missingKeys.length > 0) {
        throw new HttpsError("failed-precondition", `Missing environment variables: ${missingKeys.join(", ")}`);
    }

    return {
        genAIOptions: {
            project,
            location,
            vertexai: true,
        },
        model,
    };
})();

I am using Node 24 as of May 2026. Since Node 20, we can use the built-in process.loadEnvFile function that loads environment variables from the .env file.

In env.ts, the try-catch block attempts to load the environment variables from the .env file.

try {
    process.loadEnvFile();
} catch {
    // Ignore error if .env file is not found (e.g., in production where env vars are set by the platform)
}

In src/index.ts, the first statement imports the env.ts before importing other files and libraries.

import "./env";

... other import statements ...

If you are using a Node version that does not support process.loadEnvfile, the alternative is to install dotenv to load the environment variables.

npm i dotenv
import dotenv from "dotenv";

dotenv.config();

Firebase provides the GCLOUD_PROJECT variable, so it is not defined in the .env file.

When the missingKeys array is not empty, AUDIO_CONFIG throws an error that lists all the missing variable names. If the validation is successful, the genAIOptions and model are returned. The genAIOptions is used to initialize the GoogleGenAI and model is the selected TTS model name.

3. Sanitize the Prompt Inputs

The Cloud Function sanitizes the scene and transcript before composing the audio prompt.

The sanitizeScene function accepts the scene by escaping the newline character (' ') with the '\ '. The newline character creates a blank line and often signals the end of a block. The sanitization effectively flattens the scene into one continuous line of data and the LLM's Markdown parser recognizes it as a single, safe paragraph. The sanitization also removes all Markdown headers that are injected into the scene.

function sanitizeScene(text: string): string {
    return (text || "").trim().replace(/\r?
/g, "\\n").replace(/^[#\s]+/gm, "");
}

The sanitizeTranscript function accepts the transcript by removing all Markdown headers and triple quotes that are injected into it.

function sanitizeTranscript(text: string): string {
    return (text || "").trim().replace(/^#+/gm, "").replace(/"""/g, '"');
}

4. Build an Audio Prompt

The AudioPrompt interface encapsulates the scene, emotion, pace, transcript, and voice option to set the location, audio tags, text, and persona of the audio.

export type AudioPrompt = {
  scene: string;
  emotion: string;
  pace: string;
  transcript: string;
  voiceOption: string;
}

The SCENE_DICTIONARY is an array of scenes. When the user does not provide a scene, a scene is randomly selected from the array.

export const SCENE_DICTIONARY = [
    "A dimly lit, dusty library filled with ancient leather-bound books.
" +
        "The air is thick with history. A scholarly archivist is leaning closely into a warm, vintage ribbon microphone.
" +
        "They speak with an infectious, hushed intensity, eager to share a forgotten secret they just uncovered in a decaying manuscript.",

    "It is 10:00 PM in a glass-walled studio overlooking the moonlit London skyline, but inside, it is blindingly bright.
" +
        "The red 'ON AIR' tally light is blazing. The speaker is standing up, bouncing on the balls of their heels to the rhythm of a thumping backing track.
" +
        "It is a chaotic, caffeine-fueled cockpit designed to wake up an entire nation.",

    "A meticulously sound-treated bedroom in a suburban home.
" +
        "The space is deadened by plush velvet curtains and a heavy rug, creating an intimate, close-up acoustic environment.
" +
        "The speaker delivers the information like a trusted friend sharing an inside joke.",

    "A high-tech, minimalist laboratory humming with servers.
" +
        "Crisp, clean acoustics reflect off glass and steel.
" +
        "A brilliant but eccentric scientist is pacing back and forth, speaking rapidly and enthusiastically into a headset microphone, excited to explain a complex phenomenon.",
];

I define a buildAudioPrompt function to construct the advanced audio prompt.
When an emotion is defined, the tag is [<emotion>]. When a pace is defined, the tag is [<pace>]. The combined audio tag is [<emotion>] [<pace>]<a space> to create a proper token boundary.

The insertAudioTagsToTranscript uses a regular expression to split the transcript into lines, inserts the combined audio tag before each line, and then joins them with an empty string.

The buildAudioPrompt concatenates the scene and the expressive transcript into a string before returning it.

import { SCENE_DICTIONARY } from './constants/scenes.const';
import { AudioPrompt } from './types/audio-prompt.type';

function makeTag(value: string) {
    const trimmedValue = value.trim();
    return trimmedValue ? `[${trimmedValue}] ` : "";
}

function insertAudioTagsToTranscript({ transcript, pace, emotion }: AudioPrompt): string {
    const audioTags = `${makeTag(emotion)}${makeTag(pace)}`;
    const cleanedTranscript = sanitizeTranscript(transcript);

    const parts = cleanedTranscript.split(/(?<!\b(?:Mr|Mrs|Ms|Dr|St|i\.e|e\.g))([.!?
\r]+[”"’']*\s*)/);
    return parts
        .map((text, i, arr) => {
            if (i % 2 !== 0) {
                return ""; // Skip delimiters, they are appended to the text blocks
            }
            const delimiter = arr[i + 1] || "";
            return text.trim() ? `${audioTags}${text.trim()}${delimiter}` : delimiter;
        })
        .join("");
}

export function buildAudioPrompt(data: AudioPrompt): string {
    const randomIndex = Math.floor(Math.random() * SCENE_DICTIONARY.length);
    const selectedScene = SCENE_DICTIONARY[randomIndex];

    const trimmedScene = (data.scene || "").trim() || selectedScene;
    const escapedScene = sanitizeScene(trimmedScene);
    const transcript = insertAudioTagsToTranscript(data);

    return `## Scene:
${escapedScene}

## Transcript:
"""
${transcript}
"""
`;
}

The output of the prompt looks like:

## Scene:
<scene>

## Transcript:
[<emotion>] [<pace>] <sentence 1>[<emotion>] [<pace>] <sentence 2>...[<emotion>] [<pace>] <sentence N>

5. Generating an Expression Human Audio in a Firebase Cloud Function

The createVoiceConfig function constructs an instance of GenerateContentConfig that outputs a speech narrated by the given voice name.

import { GenerateContentConfig } from "@google/genai";

export function createVoiceConfig(voiceName = "Kore"): GenerateContentConfig {
    return {
        responseModalities: ["audio"],
        speechConfig: {
            voiceConfig: {
                prebuiltVoiceConfig: {
                    voiceName,
                },
            },
        },
    };
}
const splitList = (whitelist?: string) => (whitelist || "").split(",").map((origin) => origin.trim());

export const whitelist = splitList(process.env.WHITELIST);
export const cors = whitelist.length > 0 ? whitelist : true;
export const refererList = splitList(process.env.REFERER);

All Cloud Functions enforce App Check, CORS, and a timeout period of 600 seconds. If WHITELIST is unspecified, CORS defaults to true. While acceptable in a demo environment, configure CORS to a specific domain or false in production to prevent unauthorized access.

The readFact cloud function delegates to readFactStreamFunction when isStreaming is true. Otherwise, it is delegated to readFactFunction.

The readFactFunction function returns a Promise<string> that is the encoded base64 string.

The readFactStreamFunction functions returns a Promise<number[] | undefined> that represents a buffer of WAV header bytes.

import { onCall } from "firebase-functions/v2/https";
import { cors } from "../auth";
import { buildAudioPrompt } from './audio-prompt';
import { readFactFunction, readFactFunctionStream } from "./read-fact";
import { createVoiceConfig } from './voice-config';

const options = {
    cors,
    enforceAppCheck: true,
    timeoutSeconds: 600,
};

export const readFact = onCall(options, (request, response) => {
    const { data, acceptsStreaming } = request;
    const isStreaming = acceptsStreaming && !!response;
    const prompt = buildAudioPrompt(data);
    const voiceOption = createVoiceConfig(data.voiceOption);

    return isStreaming
        ? readFactStreamFunction(prompt, voiceOption, response)
        : readFactFunction(prompt, voiceOption);
});

The withAIAudio function is a high-order function that calls the callback to generate an audio stream.

async function withAIAudio(callback: (ai: GoogleGenAI, model: string) => Promise<string | number[] | undefined>) {
    try {
        const variables = AUDIO_CONFIG;
        if (!variables) {
            return "";
        }

        const { genAIOptions, model } = variables;
        const ai = new GoogleGenAI(genAIOptions);
        return await callback(ai, model);
    } catch (e) {
        if (e instanceof HttpsError) {
            throw e;
        }
        throw new HttpsError("internal", "An internal error occurred while setting up the AI client.", {
            originalError: (e as Error).message,
        });
    }
}

generateAudio is a callback function that uses the Gemini 3.1 Flash TTS Preview model to generate a response. getBase64DataUrl invokes extractInlineAudioData to extract the raw data and the mime type from the response. The encodeBase64String function first converts the raw data to WAV format, then encodes it to base64 format, and finally returns the base64 string.

The createAudioParams function constructs a parameter with the Gemini TTS model, the audio prompt, and the speech configuration.

async function generateAudio(aiTTS: AIAudio, prompt: string, voiceOption: GenerateContentConfig) {
    try {
        const { ai, model } = aiTTS;
        const response = await ai.models.generateContent(createAudioParams(model, prompt, voiceOption));
        return getBase64DataUrl(response);
    } catch (error) {
        console.error(error);
        throw error;
    }
}

function createAudioParams(model: string, prompt: string, config?: GenerateContentConfig) {
    return {
        model,
        contents: [
            {
                role: "user",
                parts: [
                    {
                        text: prompt,
                    },
                ],
            },
        ],
        config,
    };
}

function extractInlineAudioData(response: GenerateContentResponse): {
    rawData: string | undefined;
    mimeType: string | undefined;
} {
    const { data: rawData, mimeType } = response.candidates?.[0]?.content?.parts?.[0]?.inlineData ?? {};

    return { rawData, mimeType };
}

function getBase64DataUrl(response: GenerateContentResponse) {
    const { rawData, mimeType } = extractInlineAudioData(response);

    if (!rawData || !mimeType) {
        throw new Error("Audio generation failed: No audio data received.");
    }

    return encodeBase64String({ rawData, mimeType });
}

export function encodeBase64String({ rawData, mimeType }: RawAudioData) {
    const wavBuffer = convertToWav(rawData, mimeType);
    const base64Data = wavBuffer.toString("base64");
    return `data:audio/wav;base64,${base64Data}`;
}

generateAudioStream is a callback function that uses the Gemini 3.1 Flash TTS Preview model to stream a list of audio chunks. The chunks are iterated so that each chunk is passed to the extractInlineAudioData function to extract the raw data and the mime type. The function converts the chunk's raw data into a buffer and sends it to the client; the byte length accumulates to determine the total size of all chunks.

After all the chunks are sent to the client, the createWavHeader function uses the total byte length and the audio options to construct a WAV header and returns it.

async function generateAudioStream(
    aiTTS: AIAudio,
    prompt: string,
    voiceOption: GenerateContentConfig,
    response: CallableResponse<unknown>,
): Promise<number[] | undefined> {
    try {
        const { ai, model } = aiTTS;
        const chunks = await ai.models.generateContentStream(createAudioParams(model, prompt, voiceOption));
        let byteLength = 0;
        let options: WavConversionOptions | undefined = undefined;
        for await (const chunk of chunks) {
            const { rawData, mimeType } = extractInlineAudioData(chunk);
            if (!options && mimeType) {
                options = parseMimeType(mimeType);
                response.sendChunk({
                    type: "metadata",
                    payload: {
                        sampleRate: options.sampleRate,
                    },
                });
            }

            if (rawData && mimeType) {
                const buffer = Buffer.from(rawData, "base64");
                byteLength = byteLength + buffer.length;
                response.sendChunk({
                    type: "data",
                    payload: {
                        buffer,
                    },
                });
            }
        }

        if (options && byteLength > 0) {
            const header = createWavHeader(byteLength, options);
            return [...header];
        }

        return undefined;
    } catch (error) {
        console.error(error);
        throw error;
    }
}

The readFactFunction invokes the withAIAudio high-order function to generate a base64-encoded string.

The readFactStreamFunction function calls the withAIAudio high-order function to write chunks to the response body and send them to the client. Then, the generateAudioStream function returns the bytes of the WAV header.

export async function readFactFunction(prompt: string, voiceOption: GenerateContentConfig) {
    return withAIAudio((ai, model) => generateAudio({ ai, model }, prompt, voiceOption));
}

export async function readFactStreamFunction(prompt: string, voiceOption: GenerateContentConfig, response: CallableResponse<unknown>) {
    return withAIAudio((ai, model) => generateAudioStream({ ai, model }, prompt, voiceOption, response));
}

6. Firebase App Configuration and reCAPTCHA Site Key

I implemented a FIREBASE_APP_CONFIG IIFE (Immediately Invoked Function Expression) to run once to validate the environment variables of the Firebase app.

export const FIREBASE_APP_CONFIG = (() => {
    const env = process.env;
    const missingKeys: string[] = [];
    const apiKey = validate(env.APP_API_KEY, "API Key", missingKeys);
    const appId = validate(env.APP_ID, "App Id", missingKeys);
    const messagingSenderId = validate(env.APP_MESSAGING_SENDER_ID, "Messaging Sender ID", missingKeys);
    const recaptchaSiteKey = validate(env.RECAPTCHA_ENTERPRISE_SITE_KEY, "Recaptcha site key", missingKeys);
    const projectId = validate(env.GCLOUD_PROJECT, "Project ID", missingKeys);

    if (missingKeys.length > 0) {
        throw new Error(`Missing environment variables: ${missingKeys.join(", ")}`);
    }

    return {
        app: {
            apiKey,
            appId,
            projectId,
            messagingSenderId,
            authDomain: `${projectId}.firebaseapp.com`,
            storageBucket: `${projectId}.firebasestorage.app`,
        },
        recaptchaSiteKey,
    };
})();

The getFirebaseConfig function caches the FIREBASE_APP_CONFIG for an hour before returning it to the Angular application.

The Angular application receives the Firebase app configuration and reCAPTCHA site key from the Cloud Function to initialize Firebase AI Logic and protect resources from unauthorized access and abuse.

export const getFirebaseConfig = onRequest({ cors }, (request, response) => {
    if (!validateRequest(request, response)) {
        return;
    }

    try {
        response.set("Cache-Control", "public, max-age=3600, s-maxage=3600");
        response.json(FIREBASE_APP_CONFIG);
    } catch (err) {
        console.error(err);
        response.status(500).send("Internal Server Error");
    }
});

7. Local Development with Emulators

For local development, I used the Firebase Local Emulator Suite to save cost and time. In the bootstrapFirebase process, the application calls connectFunctionsEmulator to link to the Cloud Functions running at http://localhost:5001.

The port number defaulted to 5001 when firebase init was executed.

function connectEmulators(functions: Functions, remoteConfig: RemoteConfig) {
  if (location.hostname === 'localhost') {
    const host = getValue(remoteConfig, 'functionEmulatorHost').asString();
    const port = getValue(remoteConfig, 'functionEmulatorPort').asNumber();
    connectFunctionsEmulator(functions, host, port);
  }
}

loadFirebaseConfig is a helper function that makes request to the Cloud function to obtain the Firebase App configuration and the reCAPTCHA site key.

{
  "getFirebaseConfigUrl": "http://127.0.0.1:5001/vertexai-firebase-6a64f/us-central1/getFirebaseConfig"
}
export type FirebaseConfigResponse = {
  app: FirebaseOptions;
  recaptchaSiteKey: string
}
import { HttpClient } from '@angular/common/http';
import { inject } from '@angular/core';
import { catchError, lastValueFrom, throwError } from 'rxjs';
import config from '../../public/config.json';
import { FirebaseConfigResponse } from './ai/types/firebase-config.type';

async function loadFirebaseConfig() {
  const httpService = inject(HttpClient);
  const firebaseConfig$ =
    httpService.get<FirebaseConfigResponse>(config.getFirebaseConfigUrl)
      .pipe(catchError((e) => throwError(() => e)));
  return lastValueFrom(firebaseConfig$);
}

The bootstrapFirebase function initializes the FirebaseApp and App Check, loads the Firebase remote configuration and cloud functions, and stores them in the config service for later use.

export async function bootstrapFirebase() {
    try {
      const configService = inject(ConfigService);
      const firebaseConfig = await loadFirebaseConfig();
      const { app, recaptchaSiteKey } = firebaseConfig;
      const firebaseApp = initializeApp(app);
      const remoteConfig = await fetchRemoteConfig(firebaseApp);

      initializeAppCheck(firebaseApp, {
        provider: new ReCaptchaEnterpriseProvider(recaptchaSiteKey),
        isTokenAutoRefreshEnabled: true,
      });

      const functionRegion = getValue(remoteConfig, 'functionRegion').asString();
      const functions = getFunctions(firebaseApp, functionRegion);
      connectEmulators(functions, remoteConfig);

      configService.loadConfig(firebaseApp, remoteConfig, functions);
    } catch (err) {
      console.error(err);
    }
}

The AppConfig remains unchanged.

import { ApplicationConfig, provideAppInitializer } from '@angular/core';
import { bootstrapFirebase } from './app.bootstrap';

export const appConfig: ApplicationConfig = {
  providers: [
    provideAppInitializer(async () => bootstrapFirebase()),
  ]
};

8. Angular Implementation

8.1 Audio Tags Component

I create an AudioTagsComponent and a new signal form to input the scene, emotion, pace, and voice name in the Angular frontend.

<div>
  <h3>
    <span class="text-xl">🎙️</span> Customize Audio Generation
  </h3>

  <div class="grid grid-cols-1 md:grid-cols-2 gap-4">
    <!-- Scene -->
    <div class="flex flex-col gap-1.5 md:col-span-2">
      <label for="scene">Scene Description</label>
      <textarea id="scene" [formField]="audioPromptForm.scene"
      ></textarea>
    </div>

    <!-- Emotion -->
    <div class="flex flex-col gap-1.5">
      <label for="emotion">Vocal Emotion</label>
      <input type="text" id="emotion" [formField]="audioPromptForm.emotion"
        placeholder="e.g., panicked, whispers"
      />
    </div>

    <!-- Pace -->
    <div class="flex flex-col gap-1.5">
      <label for="pace">Speaking Pace</label>
      <input type="text" id="pace" [formField]="audioPromptForm.pace"
        placeholder="e.g., very slow, rapid"
      />
    </div>

    <!-- Voice Option -->
    <div class="flex flex-col gap-1.5 md:col-span-2">
      <label for="voiceOption">AI Voice Model</label>
      <select id="voiceOption" [formField]="audioPromptForm.voiceOption"
      >
        <option value="" disabled selected>Select a voice...</option>
        @for (option of sortedVoiceOptions(); track option.name) {
          <option [value]="option.name" class="bg-slate-800">{{ option.label }}</option>
        }
      </select>
    </div>
  </div>
</div>
import { ChangeDetectionStrategy, Component, computed, signal } from '@angular/core';
import { form, FormField } from '@angular/forms/signals';
import { VOICE_OPTIONS } from './constants/voice-options.const';
import { AudioPromptData } from './types/audio-prompt-data.type';

@Component({
  selector: 'app-audio-tags',
  imports: [FormField],
  templateUrl: './audio-tags.component.html',
  changeDetection: ChangeDetectionStrategy.OnPush,
})
export class AudioTagsComponent {
    #audioPromptModel = signal<AudioPromptData>({
      scene: 'A news anchor reading the news in a busy newsroom',
      emotion: 'professional, slightly serious',
      pace: 'moderate, clear enunciation',
      voiceOption: 'Kore'
    });
    audioPromptForm = form(this.#audioPromptModel);

    sortedVoiceOptions = computed(() => {
      const sortedList = VOICE_OPTIONS.sort((a, b) => a.name.localeCompare(b.name));

      return sortedList.map(option => ({
        name: option.name,
        label: `${option.name} - ${option.description}`
      }));
    });

    audioPromptModel = this.#audioPromptModel.asReadonly();
}

The AudioTagsComponent is imported into ObscureFactComponent such that users can input values into the experimental signal form.

In the HTML template of ObscureFactComponent, the <app-audio-tags> has a template variable audioTags, and audioTags.audioPromptModel() resolves to an instance of AudioPromptData. The data is assigned to the audioTags property of the generateSpeech method.

<div class="w-full mt-6">
    <app-audio-tags #audioTags />

    <h3>A surprising or obscure fact about the tags</h3>
    @if (interestingFact()) {
      <p>{{ interestingFact() }}</p>

      <app-error-display [error]="ttsError()" />

      <app-text-to-speech
        [isLoadingSync]="isLoadingSync()"
        [isLoadingStream]="isLoadingStream()"
        [isLoadingWebAudio]="isLoadingWebAudio()"
        [audioUrl]="audioUrl()"
        (generateSpeech)="generateSpeech({ mode: $event, audioTags: audioTags.audioPromptModel() })"
        [playbackRate]="playbackRate()"
      />
    } @else {
      <p>The tag(s) does not have any interesting or obscure fact.</p>
    }
</div>
import { AudioPromptData } from './audio-prompt-data.type';
import { GenerateSpeechMode } from '../../generate-audio.util';

export type ModeWithAudioTags = {
  mode: GenerateSpeechMode;
  audioTags: AudioPromptData;
};

export type AudioPrompt = {
  scene: string;
  emotion: string;
  pace: string;
  transcript: string;
  voiceOption: string;
};

The generateSpeech method uses the fact and audioTags to contruct an instance of AudioPrompt. When the mode is stream, the SpeechService calls generateAudioBlobURL to use the audioPrompt to construct a blob URL. When the mode is sync, the SpeechService calls generateAudio to use the audioPrompt to generate an encoded base64 string. When the mode is web_audio_api, the AudioPlayerService calls playStream to stream the audio.

import { SpeechService } from '@/ai/services/speech.service';
import { AudioPrompt } from '@/ai/types/audio-prompt.type';
import { ChangeDetectionStrategy, Component, inject, input, OnDestroy, signal } from '@angular/core';
import { revokeBlobURL } from '../blob.util';
import { AudioTagsComponent } from './audio-tags/audio-tags.component';
import { ModeWithAudioTags } from './audio-tags/types/mode-audio-tags.type';
import { generateSpeechHelper, streamSpeechWithWebAudio, ttsError } from './generate-audio.util';
import { AudioPlayerService } from './services/audio-player.service';

@Component({
  selector: 'app-obscure-fact',
  templateUrl: './obscure-fact.component.html',
  imports: [
    TextToSpeechComponent,
  ],
  changeDetection: ChangeDetectionStrategy<