Trials and tribulations fine-tuning & deploying Gemma-4 [P]

Reddit r/MachineLearning / 4/19/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

The ML team documented multiple integration issues they encountered when fine-tuning and deploying Gemma-4, especially around PEFT/LoRA not recognizing Gemma-4’s custom projection layers.
They found that TRL’s SFTTrainer silently fails training because it hardcodes use_cache=False, which breaks Gemma 4’s KV-sharing attention and prevents loss from converging.
DeepSpeed ZeRO-3 produced seemingly good training metrics but saved broken LoRA adapters with zero-element tensors for some layers, effectively nullifying fine-tuning.
For serving, existing runtimes like vLLM and SGLang may not support Gemma-4 runtime LoRAs immediately, requiring manual weight merging and state-dict key remapping before deployment.
The post points readers to a detailed Oxen AI blog entry that outlines the fixes and practical pipeline lessons for a smoother Gemma-4 fine-tuning/deployment workflow.

Hey all,

Our ML team spent some time this week getting training and deployments working for Gemma-4, and wanted to document all the things we ran into along the way.

PEFT doesn't recognize Gemma 4's custom layers. Google wrapped vision/audio projections in a new ClippableLinear class that doesn't inherit from nn.Linear, so PEFT refuses to attach LoRA, even for text-only fine-tuning. Fix: unwrap the wrappers after loading weights but before calling PEFT.
SFTTrainer killed training silently. TRL hardcodes use_cache=False, which breaks Gemma 4's KV-sharing attention. Loss never converges and there's no error, just garbage gradients. Fixed upstream in transformers v5.5.2+.
DeepSpeed ZeRO-3 saves half-empty adapters. Training loss looks perfect, but the saved LoRA file has zero-element tensors for half the layers. The model acts like it was never fine-tuned. Workaround: don't use DeepSpeed for LoRA on Gemma 4.
No runtime LoRA serving anywhere. Sometimes it takes a minute for vLLM and SGLang to support runtime LoRAs for Gemma 4's multimodal architecture. You have to merge weights and remap state dict keys manually before serving.

Much more detail in the blog, but hopefully it's helpful in your Gemma-4 journey as well!

submitted by /u/FallMindless3563
[link] [comments]