Created a SillyTavern extension that brings NPC's to life in any game

Reddit r/LocalLLaMA / 3/24/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

Key Points

  • A developer created a SillyTavern extension that enables NPC roleplay “in any game” by using SillyTavern as the RP backend and a small game mod as a bridge.
  • The approach feeds each game’s wiki/lore into SillyTavern so NPCs can maintain individual knowledge, relationships, and opinions, while a voice-cloning step maps spoken audio from the game’s files to NPCs.
  • The system supports locally run, RP-tuned models (e.g., Cydonia for narration/game master behavior and Qwen 3.5 0.8B for a second-pass action mapping) to keep latency low.
  • After free-form RP, a smaller structured-output model interprets each message and triggers corresponding in-game actions exposed by the mod, such as narrating that an NPC shoots when the player character attacks.
  • The author argues that tightly fine-tuned RP models and action-mapping greatly increase NPC depth—even in older games—and suggests the concept is underappreciated due to lack of awareness of specialized RP model performance.
Created a SillyTavern extension that brings NPC's to life in any game

Using SillyTavern as the backend for all the RP means it can work with almost any game, with just a small mod acting as a bridge between them. Right now I’m using Cydonia as the RP model and Qwen 3.5 0.8B as the game master. Everything is running locally.

The idea is that you can take any game, download its entire wiki, and feed it into SillyTavern. Then every character has their own full lore, relationships, opinions, etc., and can respond appropriately. On top of that, every voice is automatically cloned using the game’s files and mapped to each NPC. The NPCs can also be fed as much information per turn as you want about the game world - like their current location, player stats, player HP, etc.

All RP happens inside SillyTavern, and the model is never even told it’s part of a game world. Paired with a locally run RP-tuned model like Cydonia, this gives great results with low latency, as well as strong narration of physical actions.

A second pass is then run over each message using a small model (currently Qwen 3.5 0.8B) with structured output. This maps responses to actual in-game actions exposed by your mod. For example, in this video I approached an NPC and only sent “shoots at you”. The NPC then narrated themselves shooting back at me. Qwen 3.5 reads this conversation and decides that the correct action is for the NPC to shoot back at the player.

Essentially, the tiny model acts as a game master, deciding which actions should map to which functions in-game. This means the RP can flow freely without being constrained to a strict structure, which leads to much better results.

In older games, this could add a lot more life even without the conversational aspect. NPCs simply reacting to your actions adds a ton of depth.

Not sure why this isn’t more popular. My guess is that most people don’t realise how good highly specialised, fine-tuned RP models can be compared to base models. I was honestly blown away when I started experimenting with them while building this.

submitted by /u/goodive123
[link] [comments]