GLaDOS TTS Build Kit: Train GLaDOS Voice if You Own Portal 1 and 2

Reddit r/LocalLLaMA / 5/3/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • The article introduces the “GLaDOS TTS Build Kit” repository, which lets you train a local GLaDOS-style TTS voice using only your own installed copies of Portal and Portal 2.
  • It provides a source-only training pipeline that extracts voice lines from local game VPKs, converts Source audio to clean 24 kHz mono PCM, and then generates training data.
  • The pipeline transcribes clips using Cohere Transcribe via CohereX, and also scrapes Portal Wiki transcripts to use as ground truth, reconciling both transcript sources to filter mismatches.
  • It optionally includes a small local web UI for reviewing problematic clips, then builds manifests and trains a local OmniVoice TTS model.
  • The author emphasizes that the kit does not bundle Valve audio, extracted clips, transcripts, samples, checkpoints, or trained weights, keeping all generated outputs in ignored local data directories.

I put together a repo for training a local GLaDOS-style TTS voice from your own installed copies of Portal and Portal 2:

https://github.com/JoeHelbing/glados-tts-build-kit

Writeup: https://www.joehelbing.net/post/glados-tts

The important bit: this does not include Valve audio, extracted clips, transcripts, samples, checkpoints, or trained weights. It's just the pipeline. You provide your own local game files, and everything generated stays under ignored local data/ paths.

What it does:

  • Extracts the GLaDOS voice lines from local Portal / Portal 2 VPKs
  • Converts the Source MP3-in-WAV files into clean 24 kHz mono PCM
  • Transcribes the clips with Cohere Transcribe through CohereX
  • Scrapes Portal Wiki transcripts as a ground-truth reference
  • Reconciles the two transcript sources and filters bad/mismatched clips
  • Optionally gives you a little local web UI to hand-review messy clips
  • Builds manifests and trains a local OmniVoice TTS model

Basically, I wanted something reproducible where someone who already owns the games could run the pipeline locally instead of downloading somebody else's dataset or model weights.

Credit where due: I got the original game-file extraction idea from systemofapwne/piper-de-glados, then built this version around a full source-only training pipeline.

submitted by /u/Mr_International
[link] [comments]