AVION: Aerial Vision-Language Instruction from Offline Teacher to Prompt-Tuned Network
arXiv cs.CV / 3/16/2026
📰 NewsModels & Research
Key Points
- AVION proposes a knowledge distillation framework to adapt vision-language models to remote sensing imagery, addressing limited textual semantic coverage and adaptable visual features.
- The teacher module generates semantically rich textual prototypes by collecting descriptions from a large language model and validating them with remote sensing image features.
- The student module introduces lightweight, learnable prompts in both the vision and language encoders, guided by the teacher to align embeddings and cross-modal relationships; inference uses the trained student with no teacher.
- Experiments on six optical remote sensing benchmarks show improved few-shot classification and base-class accuracy, while preserving generalization to novel categories and boosting mean recall for cross-modal retrieval with minimal trainable parameters.
- AVION demonstrates efficient adaptation with limited additional trainable parameters and improved cross-modal retrieval, indicating practical benefit for remote sensing VLM deployment.
Related Articles

報告:LLMにおける「自己言及的再帰」と「ステートフル・エミュレーション」の観測
note

諸葛亮 孔明老師(ChatGPTのロールプレイ)との対話 その肆拾伍『銀河文明・ダークマターエンジン』
note

GPT-5.4 mini/nano登場!―2倍高速で無料プランも使える小型高性能モデル
note

Why a Perfect-Memory AI Agent Without Persona Drift is Architecturally Impossible
Dev.to
OCP: Orthogonal Constrained Projection for Sparse Scaling in Industrial Commodity Recommendation
arXiv cs.LG