GIFT: Guided Fine-Tuning and Transfer for Enhancing Instruction-Tuned Language Models

arXiv cs.CL / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

GIFT (Guided Fine-Tuning and Transfer) is proposed as a framework to adapt instruction-tuned language models by using the instruction model to actively guide training rather than only merging at the end.
The method fine-tunes a low-rank adapter on a pretrained base model, where confidence signals are extracted from the instruction-tuned model to steer task adaptation.
After training, the learned adapter is merged into the instruction-tuned model to produce task-specialized models that retain strong instruction-following behavior.
Experiments on mathematics and knowledge-intensive benchmarks across multiple model families and sizes show GIFT consistently beats direct fine-tuning and several transfer-based baselines.
GIFT also maintains robust generalization and benefits from favorable scaling behavior at test time.

Abstract

A promising paradigm for adapting instruction-tuned language models is to learn task-specific updates on a pretrained base model and subsequently merge them into the instruction-tuned model. However, existing approaches typically treat the instruction-tuned model as a passive target that is only involved at the final merging stage, without guiding the training process. We propose GIFT (Guided Fine-Tuning and Transfer), a simple and efficient framework that incorporates guidance from the instruction model into task adaptation. GIFT fine-tunes a low-rank adapter on the pretrained base model using confidence signals derived from the instruction-tuned model. The learned adapter is then merged into the instruction-tuned model, yielding task-specialized models that preserve general instruction-following behavior. We evaluate GIFT on mathematical and knowledge-intensive benchmarks across multiple model families and scales. Results show that GIFT consistently outperforms direct fine-tuning and representative transfer-based baselines, while maintaining robust generalization and favorable test-time scaling behavior.