From Articles to Canopies: Knowledge-Driven Pseudo-Labelling for Tree Species Classification using LLM Experts

arXiv cs.CV / 4/20/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper tackles hyperspectral tree species classification challenges caused by limited/imbalanced labels, spectral mixing, and ecological variability by combining biological and structural vegetation information rather than using spectral signatures alone.
  • It proposes a biologically informed semi-supervised deep learning framework that fuses hyperspectral imaging (HSI) and airborne laser scanning (ALS) and performs pseudo-labelling on a precomputed canopy graph to reduce training cost.
  • The method leverages ecological knowledge encoded as species cohabitation priors, where large language models (LLMs) extract/derive co-occurrence likelihoods and represent them as a cohabitation matrix.
  • Experiments on a real forest dataset show a 5.6% improvement over the best reference method, and expert review indicates the cohabitation priors are accurate with differences no larger than 15%.
  • Overall, the work demonstrates how LLM-derived ecological priors can be integrated into a deep learning pseudo-labelling pipeline to improve classification robustness under data scarcity.

Abstract

Hyperspectral tree species classification is challenging due to limited and imbalanced class labels, spectral mixing (overlapping light signatures from multiple species), and ecological heterogeneity (variability among ecological systems). Addressing these challenges requires methods that integrate biological and structural characteristics of vegetation, such as canopy architecture and interspecific interactions, rather than relying solely on spectral signatures. This paper presents a biologically informed, semi-supervised deep learning method that integrates multi-sensor Earth observation data, specifically hyperspectral imaging (HSI) and airborne laser scanning (ALS), with expert, ecological knowledge. The approach relies on biologically inspired pseudo-labelling over a precomputed canopy graph, yielding accurate classification at low training cost. In addition, ecological priors on species cohabitation are automatically derived from reliable sources using large language models (LLMs) and encoded as a cohabitation matrix with likelihoods of species occurring together. These priors are incorporated into the pseudo-labelling strategy, effectively introducing expert knowledge into the model. Experiments on a real-world forest dataset demonstrate 5.6% improvement over the best reference method. Expert evaluation of cohabitation priors reveals high accuracy with differences no larger than 15%.