EffiMiniVLM: A Compact Dual-Encoder Regression Framework
arXiv cs.CV / 4/6/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- EffiMiniVLM is proposed as a compact dual-encoder vision-language regression framework for predicting product quality in cold-start settings using images and textual metadata when user history is unavailable.
- The approach combines an EfficientNet-B0 image encoder and a MiniLM-based text encoder with a lightweight regression head, aiming to reduce computational cost compared with larger vision-language models.
- A weighted Huber loss is introduced to improve training sample efficiency by emphasizing more reliable samples using rating-count information.
- The model is trained on only 20% of the Amazon Reviews 2023 dataset, uses 27.7M parameters and 6.8 GFLOPs, and reports a CES score of 0.40 with the lowest resource cost in the benchmark.
- The authors find strong scalability, noting that increasing training data to 40% can let EffiMiniVLM outperform other methods that rely on larger models and external datasets.
Related Articles

Black Hat Asia
AI Business

Оказывается, эта нейросеть рисует бесплатно. Я узнал случайно.
Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Three-Layer Memory Governance: Core, Provisional, Private
Dev.to

I Researched AI Prompting So You Don’t Have To
Dev.to