Quantifying the human visual exposome with vision language models

arXiv cs.CV / 5/6/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The study tackles the lack of direct, objective quantification of the visual environment’s role in mental health by moving beyond coarse location proxies and self-reports.
It combines ecological momentary assessment with vision-language models (VLMs) to estimate the semantic “richness” of daily visual experience from participant photos.
Using 2,674 participant-generated photographs, the VLM-derived greenness estimates robustly predicted both momentary affect and chronic stress, aligning with existing benchmarks.
The authors build a semi-autonomous LLM-driven pipeline that mines over seven million scientific publications to extract nearly 1,000 environment-related features linked to mental health.
On real-world imagery, VLM-derived context ratings showed significant correlations with affect and stress for up to 33% of the extracted contextual signals, supporting scalable visual exposomics.

Abstract

The visual environment is a fundamental yet unquantified determinant of mental health. While the concept of the environmental exposome is well established, current methods rely on coarse geospatial proxies or biased self reports, failing to capture the first person visual context of daily life. We addressed this gap by coupling ecological momentary assessment with vision language models (VLMs) to quantify the semantic richness of human visual experience. Across 2674 participant generated photographs, VLM derived estimates of greenness robustly predicted momentary affect and chronic stress, consistent with established benchmarks. We then developed a semi autonomous large language model (LLM) based pipeline that mined over seven million scientific publications to extract nearly 1000 environmental features empirically linked to mental health. When applied to real world imagery, up to 33 percent of VLM extracted context ratings significantly correlated with affect and stress. These findings establish a scalable objective paradigm for visual exposomics, enabling high throughput decoding of how the visible world is associated with mental health.

Top 10 Free AI Tools for Students in 2026: The Ultimate Study Guide

Dev.to

AI as Your Contingency Co-Pilot: Automating Wedding Day 'What-Ifs'

Dev.to

Google AI Releases Multi-Token Prediction (MTP) Drafters for Gemma 4: Delivering Up to 3x Faster Inference Without Quality Loss

MarkTechPost

When Claude Hallucinates in Court: The Latham & Watkins Incident and What It Means for Attorney Liability

MarkTechPost

Solidity LM surpasses Opus

Reddit r/LocalLLaMA

Quantifying the human visual exposome with vision language models

Key Points

Abstract

Related Articles

Top 10 Free AI Tools for Students in 2026: The Ultimate Study Guide

AI as Your Contingency Co-Pilot: Automating Wedding Day 'What-Ifs'

Google AI Releases Multi-Token Prediction (MTP) Drafters for Gemma 4: Delivering Up to 3x Faster Inference Without Quality Loss

When Claude Hallucinates in Court: The Latham & Watkins Incident and What It Means for Attorney Liability

Solidity LM surpasses Opus

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer