Language-Pretraining-Induced Bias: A Strong Foundation for General Vision Tasks

Key Points

The paper argues that language-pretrained and vision-pretrained models have substantially different parameter “outlier” patterns, making language-to-vision transfer harder than simpler cross-domain adaptation.

Abstract

The ratio of outlier parameters in language pre-training models and vision pre-training models differs significantly, making cross-modality (language and vision) inherently more challenging than cross-domain adaptation. As a result, many prior studies have focused on cross-domain transfer rather than attempting to bridge language and vision modalities, assuming that language pre-trained models are unsuitable for downstream visual tasks due to disparate parameter spaces. Contrary to this assumption, we show that adding a bridge training stage as a modality adaptation learner can effectively align Large Language Model (LLM) parameters with vision tasks. Specifically, we propose a simple yet powerful solution random label bridge training that requires no manual labeling and helps LLM parameters adapt to vision foundation tasks. Moreover, our findings reveal that partial bridge training is often advantageous, as certain layers in LLMs exhibit strong foundational properties that remain beneficial even without fine-tuning for visual tasks. This surprising discovery opens up new avenues for leveraging language pre-trained parameters directly within vision models and highlights the potential of partial bridge training as a practical pathway to cross-modality adaptation.

Language-Pretraining-Induced Bias: A Strong Foundation for General Vision Tasks

Key Points

Abstract

Related Articles

Black Hat Asia

跳出幸存者偏差，从结构性资源分配解析财富真相

The Sentinel: AI-Powered Zero-Touch Insurance for Gig Workers

From Crisis to Clinic: How AI Automates Drug Shortage Resolution

Gemma 4 is seriously broken when using Unsloth and llama.cpp

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Related Articles

The Sentinel: AI-Powered Zero-Touch Insurance for Gig Workers
Dev.to

From Crisis to Clinic: How AI Automates Drug Shortage Resolution
Dev.to

Gemma 4 is seriously broken when using Unsloth and llama.cpp
Reddit r/LocalLLaMA