AVION: Aerial Vision-Language Instruction from Offline Teacher to Prompt-Tuned Network

arXiv cs.CV / 3/16/2026

📰 NewsModels & Research

共有:

Key Points

AVION proposes a knowledge distillation framework to adapt vision-language models to remote sensing imagery, addressing limited textual semantic coverage and adaptable visual features.
The teacher module generates semantically rich textual prototypes by collecting descriptions from a large language model and validating them with remote sensing image features.
The student module introduces lightweight, learnable prompts in both the vision and language encoders, guided by the teacher to align embeddings and cross-modal relationships; inference uses the trained student with no teacher.
Experiments on six optical remote sensing benchmarks show improved few-shot classification and base-class accuracy, while preserving generalization to novel categories and boosting mean recall for cross-modal retrieval with minimal trainable parameters.
AVION demonstrates efficient adaptation with limited additional trainable parameters and improved cross-modal retrieval, indicating practical benefit for remote sensing VLM deployment.

Abstract

Adapting vision-language models to remote sensing imagery remains challenging due to two key factors: limited semantic coverage in textual representations and insufficient adaptability of visual features. These issues are particularly significant in aerial scenes, which involve various visual appearances and fine-grained object distinctions. We propose AVION, a knowledge distillation framework tailored for remote sensing adaptation of vision-language models. The teacher module constructs semantically rich textual prototypes by collecting descriptions from a large language model and verifying validity using remote sensing image features. The student module integrates lightweight and learnable prompts into both vision and language encoders, guided by the teacher to align embeddings and their cross-modal relationships. Once trained, the student operates independently during inference. Experiments on six optical remote sensing benchmarks show that AVION improves few-shot classification and base-class accuracy without degrading generalization to novel categories. It also enhances mean recall for cross-modal retrieval, with minimal additional trainable parameters.

Edge-to-Cloud Swarm Coordination for heritage language revitalization programs with embodied agent feedback loops

Dev.to

AI Crawler Management: The Definitive Guide to robots.txt for AI Bots

Dev.to

I benchmarked 31 STT models on medical audio — VibeVoice 9B is the new open-source leader at 8.34% WER, but it's big and slow

Reddit r/LocalLLaMA

Anthropic confirms leaked model marks a "step change" in reasoning after data breach reveals its existence

THE DECODER

AI agent accelerates catalyst discovery for sustainable fuel development

Reddit r/artificial

AVION: Aerial Vision-Language Instruction from Offline Teacher to Prompt-Tuned Network

Key Points

Abstract

Related Articles

Edge-to-Cloud Swarm Coordination for heritage language revitalization programs with embodied agent feedback loops

AI Crawler Management: The Definitive Guide to robots.txt for AI Bots

I benchmarked 31 STT models on medical audio — VibeVoice 9B is the new open-source leader at 8.34% WER, but it's big and slow

Anthropic confirms leaked model marks a "step change" in reasoning after data breach reveals its existence

AI agent accelerates catalyst discovery for sustainable fuel development

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer