Instruction-Tuned, but Not More Verifiable Instruction-Following: A Cross-Task Diagnosis for LoRA Adapters
arXiv cs.LG / 3/25/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper tests whether “nominal” training labels for LoRA adapters (e.g., instruction-tuned) reliably predict realized cross-task capability gains when the same adapter is evaluated across tasks.
- Using IFEval as a strict, automatically verifiable target for instruction following, the authors find that nominal labels often fail to forecast improvements, showing configuration sensitivity with some near-zero or negative cases.
- In a controlled instruction-versus-numeric example, an instruction-tuned adapter dramatically improves off-target numeric benchmark performance but does not improve verifiable instruction following on IFEval, illustrating a “capability drift” mismatch.
- The mismatch is observable in the raw cross-task performance matrix, and the authors use a drift score only as a compact summary rather than introducing a new formal metric.
- Results on broader instruction-following benchmarks are mixed and benchmark-dependent, leading to a practical recommendation to run routine cross-task evaluation before deployment and not treat nominal labels as dependable capability proxies.
Related Articles
5 Signs Your Consulting Firm Needs AI Agents (Not More Staff)
Dev.to
AgentDesk vs Hiring Another Consultant: A Cost Comparison
Dev.to
"Why Your AI Agent Needs a System 1"
Dev.to
When should we expect TurboQuant?
Reddit r/LocalLLaMA
AI as Your Customs Co-Pilot: Automating HS Code Chaos in Southeast Asia
Dev.to