Frontier models can't run on satellites. Here's an end-to-end wildfire detection pipeline using a 450M on-board Vision-Language Model (Sentinel-2 + LFM2.5-VL)

Reddit r/LocalLLaMA / 5/4/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • The article argues that running frontier vision-language models directly on satellites is constrained less by model quality than by bandwidth, since downlinking full multispectral imagery per orbit is not scalable.
  • It presents an end-to-end wildfire detection pipeline that uses a ~450M on-board vision-language model with Sentinel-2, performing inference in space and downlinking only a structured JSON “risk profile.”
  • The pipeline combines RGB bands (B4-B3-B2) with SWIR bands (B12-B8-B4), emphasizing SWIR as the key signal for vegetation moisture stress, which is linked to wildfire fuel conditions.
  • A local PoC is simulated using “SimSat” to emulate orbital operations and fetch real Sentinel-2 tiles from an AWS Element84 STAC catalog, while the VLM runs locally (via llama-server) and stores results in SQLite for visualization with a Streamlit app.
  • The author notes that achieving strong accuracy will likely require additional data collection, labeling, evaluations, and fine-tuning beyond the out-of-the-box performance of the 450M VLM.
Frontier models can't run on satellites. Here's an end-to-end wildfire detection pipeline using a 450M on-board Vision-Language Model (Sentinel-2 + LFM2.5-VL)

Sharing a project I've been building: a full end-to-end wildfire prevention pipeline that runs a Vision-Language Model directly on a satellite, using Sentinel-2 imagery.

The interesting design constraint isn't model quality. It's bandwidth. A frontier model on the ground means downlinking massive multispectral image matrices per orbit, which doesn't scale. A 450M VLM small enough to run on-board flips it: do inference in space, downlink only the JSON risk profile.

The pipeline pairs RGB (B4-B3-B2) with SWIR (B12-B8-B4) tiles. SWIR is the key signal. It captures vegetation moisture stress, which is the actual fuel indicator for fires. The VLM gets holistic scene understanding instead of just pixel stats, and outputs a structured risk_level plus breakdown.

For the PoC I'm simulating the on-board pipeline locally:

  • SimSat (Docker) simulates orbit and serves real Sentinel-2 from the AWS Element84 STAC catalog
  • LFM2.5-VL-450M runs locally via llama-server
  • A watch loop polls position, fetches the image pair, runs inference, writes to SQLite
  • Streamlit app on top to visualize predictions across 22 fire-prone locations (Attica, Angeles National Forest, Borneo, etc.)

This post covers problem framing and system design. The next ones cover data collection and labelling, evals, and fine-tuning, because out-of-the-box, a 450M VLM is not Opus-tier and you need to close that gap deliberately.

Code's in the Liquid AI Cookbook (link below). Curious what people think about on-device or on-edge inference for this kind of geospatial use case. Anyone doing similar work with constrained-bandwidth deployments?

Full write-up: https://github.com/Liquid4All/cookbook/tree/main/examples/wildfire-prevention

Code: https://github.com/Liquid4All/cookbook/tree/main/examples/wildfire-prevention

submitted by /u/PauLabartaBajo
[link] [comments]