LLaVA-LE: Large Language-and-Vision Assistant for Lunar Exploration

arXiv cs.CV / 3/27/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

LLaVA-LE は、月面・月地下の地形特性を画像と文章から推論・記述することを目的にしたビジョン言語モデルとして提案され、惑星科学へのVLM活用を後押しする狙いがある。
月の実画像と科学的な説明を対応付けた大規模データセット LUCID（高解像度画像96k＋詳細キャプション、さらに約20k画像から派生したQA81k）を新たに整備し、学習基盤を提供する。
学習は2段階カリキュラム（領域特化のコンセプトアライメント→指示追従型VQA）でLLaVAをファインチューニングし、月の地形解析に対応した評価ベンチマークも設計している。
GPT/Geminiのジャッジ比較で、Base LLaVAに対し全体性能3.3倍、Stage1モデルに対し2.1倍を達成し、ドメイン特化データと指示チューニングの有効性を示している。
実装コードは公開リポジトリ（GitHub）で提供されている。

Abstract

Recent advances in multimodal vision-language models (VLMs) have enabled joint reasoning over visual and textual information, yet their application to planetary science remains largely unexplored. A key hindrance is the absence of large-scale datasets that pair real planetary imagery with detailed scientific descriptions. In this work, we introduce LLaVA-LE (Large Language-and-Vision Assistant for Lunar Exploration), a vision-language model specialized for lunar surface and subsurface characterization. To enable this capability, we curate a new large-scale multimodal lunar dataset, LUCID (LUnar Caption Image Dataset) consisting of 96k high-resolution panchromatic images paired with detailed captions describing lunar terrain characteristics, and 81k question-answer (QA) pairs derived from approximately 20k images in the LUCID dataset. Leveraging this dataset, we fine-tune LLaVA using a two-stage training curriculum: (1) concept alignment for domain-specific terrain description, and (2) instruction-tuned visual question answering. We further design evaluation benchmarks spanning multiple levels of reasoning complexity relevant to lunar terrain analysis. Evaluated against GPT and Gemini judges, LLaVA-LE achieves a 3.3x overall performance gain over Base LLaVA and 2.1x over our Stage 1 model, with a reasoning score of 1.070, exceeding the judge's own reference score, highlighting the effectiveness of domain-specific multimodal data and instruction tuning to advance VLMs in planetary exploration. Code is available at https://github.com/OSUPCVLab/LLaVA-LE.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 3/27DailyView insight →

Black Hat Asia

AI Business

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

I shipped Google's TurboQuant as a vLLM plugin 72 hours after the paper — here's what nobody else tested

Dev.to

We built a governance layer for AI-assisted development (with runtime validation and real system)

Dev.to

No AI system using the forward inference pass can ever be conscious.

Reddit r/artificial

LLaVA-LE: Large Language-and-Vision Assistant for Lunar Exploration

Key Points

Abstract

💡 Insights using this article

Related Articles

Black Hat Asia

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

I shipped Google's TurboQuant as a vLLM plugin 72 hours after the paper — here's what nobody else tested

We built a governance layer for AI-assisted development (with runtime validation and real system)

No AI system using the forward inference pass can ever be conscious.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer