Generalist Multimodal LLMs Gain Biometric Expertise via Human Salience
arXiv cs.CV / 3/19/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper investigates whether general-purpose multimodal large language models (MLLMs) can perform iris presentation attack detection (PAD) under strict privacy constraints, using human expert knowledge to augment prompts.
- Pre-trained vision transformers in MLLMs inherently cluster iris attack types in their embeddings, even without explicit training for PAD.
- When structured prompts incorporating human salience (verbal indicators from subjects) are used, the models resolve ambiguities and improve detection.
- On a IRB-restricted dataset of 224 iris images spanning seven attack types, using university-approved services or locally-hosted models, Gemini with expert-informed prompts outperforms a CNN-based baseline and human examiners, while Llama 3.2-Vision achieves near-human performance.
- The results suggest MLLMs deployable within institutional privacy constraints offer a viable path for iris PAD, addressing data-sharing and privacy challenges while maintaining high accuracy.
Related Articles
Day 10: 230 Sessions of Hustle and It Comes Down to One Person Reading a Document
Dev.to

5 Dangerous Lies Behind Viral AI Coding Demos That Break in Production
Dev.to
Two bots, one confused server: what Nimbus revealed about AI agent identity
Dev.to

OpenTelemetry just standardized LLM tracing. Here's what it actually looks like in code.
Dev.to
PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark forFinance
Dev.to