AI Navigate

One Model, Many Skills: Parameter-Efficient Fine-Tuning for Multitask Code Analysis

arXiv cs.AI / 3/12/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper provides the first comprehensive evaluation of multi-task parameter-efficient fine-tuning (PEFT) for code analysis across tasks and model architectures, showing that a single PEFT module can match or exceed full multi-task fine-tuning.
  • It demonstrates that multi-task PEFT achieves a favorable accuracy-cost trade-off, delivering near single-task fine-tuning accuracy while dramatically reducing trainable parameters and computing requirements, including a storage reduction proportional to the number of tasks and up to 85% lower computation.
  • The results indicate that performance with multi-task PEFT is sensitive to task grouping and is shaped by factors such as task stability, model architecture, task complementarity, asymmetry, and dataset quality.
  • Compared to prompting open-source LLMs (DeepSeek, Qwen, Mistral, CodeLlama, StarCoder), even a 1B-parameter model with multi-task PEFT outperforms them on code-analysis tasks.
  • These findings inform practice by highlighting when to prefer PEFT over prompting and how task design and dataset quality influence co-fine-tuning outcomes.

Abstract

Large language models have recently surpassed specialized systems on code generation, yet their effectiveness on other code-analysis tasks remains less clear. At the same time, multi-task learning offers a way to unify diverse objectives within a single model, but fully fine-tuning LLMs across tasks is computationally prohibitive. Parameter-efficient fine-tuning mitigates this cost by updating only a small fraction of weights. Although PEFT has proven effective in single-task settings, its potential for multi-task learning has not yet been systematically explored. We present the first comprehensive evaluation of multi-task PEFT for code analysis, comparing several methods across diverse tasks and model architectures. Our experiments show that a single PEFT module shared across tasks can match, and in some cases surpass, full multi-task fine-tuning, confirming that the benefits of PEFT extend beyond isolated tasks. When comparing single-task and multi-task setups, we find that multi-task PEFT achieves a favorable performance-efficiency trade-off: it delivers accuracy close to single-task fine-tuning while reducing storage requirements, cutting the number of trainable parameters by a factor of the task count, and lowering computation costs by as much as 85%. At the same time, multi-task gains remain sensitive to task grouping. Through task-pairing experiments, we identify key factors shaping outcomes: task stability, model architecture, task complementarity, asymmetry, and dataset quality determine the success of co-fine-tuning. Finally, we benchmark efficient multi-task PEFT against direct prompting of open-source general-purpose LLMs, including DeepSeek, Qwen, Mistral, CodeLlama, and StarCoder. Despite their strong performance in code generation, these models underperform on analysis tasks, where even a 1B-parameter model with multi-task PEFT achieves significantly better results.