Prompt vocabulary matters more than prompt quality & other lessons from generating 400 game sprites overnight

Reddit r/LocalLLaMA / 3/27/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisTools & Practical Usage

Key Points

  • The author reports building an AI image pipeline to generate ~400 game sprite assets and shares non-obvious failure modes encountered during production.
  • They found that exact prompt phrasing (“sparse tint maps overlays”) can unlock different model behavior more reliably than being generally descriptive.
  • The article notes that Gemini image editing may silently ignore base64-encoded reference images, requiring the reference to be uploaded and passed via a hosted URL.
  • For background removal, BiRefNet can output visually “valid” but fully transparent PNGs (e.g., 334 bytes), so robust validation (file size and alpha-channel mean) is necessary.
  • At scale, practical batching strategies (e.g., generating multiple icons/sprite layers in a single call) reduced API call counts and improved consistency.

Spent the last few weeks building an AI image pipeline to generate ~400 assets (unit sprites, icons, terrain tiles) for an open source Civ game as part of my job. Sharing the specific failure modes because a few of them were genuinely non-obvious.

The thing that surprised me most: exact phrasing unlocks entirely different model behavior

I needed sparse tint overlay masks. These are images where only certain pixels are colored, showing where team colors appear on a sprite. Every reasonable prompt produced solid silhouette fills. "Color masks," "tint layers," "overlay maps" — all solid fills. The phrase that worked was "sparse tint maps overlays." That exact string. Other phrasings produced wrong outputs every time. I don't have a good mental model for why this one works, but it does consistently.

Same thing with layout. Asking for a horizontal 3-panel image with 16:9 aspect ratio produced vertical stacks. Switching to 1:1 + "horizontal layout" in the prompt fixed it.

Base64 data URIs are silently ignored by Gemini image editing

If you're passing a reference image as base64, the model is probably ignoring it and generating from text alone. Found this after producing 40 images that were all identical regardless of what reference I sent. Fix is to upload to CDN storage first and pass the hosted URL. Not documented prominently.

BiRefNet's failure mode is sneaky

Used BiRefNet for background removal. It occasionally returns a valid-looking PNG of exactly 334 bytes that is entirely transparent: correct headers, correct format, zero foreground. File size check doesn't catch it. The right check is size > 5000 bytes AND alpha channel mean > 0.1 (magick f -channel A -separate -format '%[fx:mean]' info:). A blank output has mean 0.0.

Batching that actually worked at scale

  • Icons: 3×3 grid (9 vanilla icons → one API call → crop back to 9). 9× reduction in calls across 365 icons.
  • Sprites with tint layers: pack all 3 PNG layers into one horizontal triptych, generate in a single call. Separate calls produced inconsistent results because the model never saw all layers together.

Happy to share more specifics on any of these if useful. The prompt vocabulary thing is the one I'd most want to know going in. You really need to focus on hitting whatever phrase the model was trained on. rather than being more descriptive or clearer.

We continue to experiment with sprite sheet generation so if anyone has more tips I'll be very curious!

submitted by /u/Low-Cook-3544
[link] [comments]