A Parameter-Efficient Transfer Learning Approach through Multitask Prompt Distillation and Decomposition for Clinical NLP

arXiv cs.CL / 4/9/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a multitask prompt distillation and decomposition framework that learns one shared metaprompt from 21 clinical NLP tasks and transfers it to new tasks.
  • It adapts to unseen target tasks using fewer than 0.05% trainable parameters, aiming to reduce compute and storage overhead versus task-by-task prompt tuning.
  • Across five clinical NLP categories (NER, relation extraction, QA, NLI, and summarization) on 10 held-out datasets and three backbone models (LLaMA 3.1 8B, Meditron3 8B, gpt-oss 20B), the method outperforms LoRA by 1.5–1.7% while using orders of magnitude fewer parameters.
  • Compared with single-task prompt tuning, it improves performance by 6.1–6.6%, with gpt-oss 20B showing the best overall results, especially on clinical reasoning tasks.
  • Strong zero-shot and few-shot results suggest the shared prompt representation transfers effectively across diverse clinical tasks.

Abstract

Existing prompt-based fine-tuning methods typically learn task-specific prompts independently, imposing significant computing and storage overhead at scale when deploying multiple clinical natural language processing (NLP) systems. We present a multitask prompt distillation and decomposition framework that learns a single shared metaprompt from 21 diverse clinical source tasks and adapts it to unseen target tasks with fewer than 0.05% trainable parameters. Evaluated across five clinical NLP task types (named entity recognition, relation extraction, question answering, natural language inference, and summarization) on 10 held-out target datasets using three backbone models (LLaMA 3.1 8B, Meditron3 8B, gpt-oss 20B), our framework consistently outperforms LoRA by 1.5~1.7% despite using orders of magnitude fewer parameters, and exceeds single-task prompt tuning by 6.1~6.6%. The gpt-oss 20B model achieves the highest overall performance, particularly on clinical reasoning tasks. The strong zero- and few-shot performance demonstrates better transferability of the shared prompt representation.