FashionStylist: An Expert Knowledge-enhanced Multimodal Dataset for Fashion Understanding

arXiv cs.CV / 4/13/2026

📰 NewsModels & Research

共有:

Key Points

The paper introduces FashionStylist, an expert-annotated multimodal benchmark aimed at holistic fashion understanding that combines visual perception with style and rationale reasoning.
The dataset is built via a dedicated fashion-expert annotation pipeline and includes professionally grounded labels at both item and full-outfit levels.
FashionStylist supports three tasks—outfit-to-item grounding, outfit completion, and outfit evaluation—covering complex item recovery (layering/accessories), compatibility-aware composition (beyond co-occurrence), and expert scoring of style/season/occasion/coherence.
Experiments indicate the benchmark functions as a unified training/evaluation resource and improves performance for grounding, completion, and outfit-level semantic evaluation in MLLM-based fashion systems.

Abstract

Fashion understanding requires both visual perception and expert-level reasoning about style, occasion, compatibility, and outfit rationale. However, existing fashion datasets remain fragmented and task-specific, often focusing on item attributes, outfit co-occurrence, or weak textual supervision, and thus provide limited support for holistic outfit understanding. In this paper, we introduce FashionStylist, an expert-annotated benchmark for holistic and expert-level fashion understanding. Constructed through a dedicated fashion-expert annotation pipeline, FashionStylist provides professionally grounded annotations at both the item and outfit levels. It supports three representative tasks: outfit-to-item grounding, outfit completion, and outfit evaluation. These tasks cover realistic item recovery from complex outfits with layering and accessories, compatibility-aware composition beyond co-occurrence matching, and expert-level assessment of style, season, occasion, and overall coherence. Experimental results show that FashionStylist serves not only as a unified benchmark for multiple fashion tasks, but also as an effective training resource for improving grounding, completion, and outfit-level semantic evaluation in MLLM-based fashion systems.

I scaled a pure Spiking Neural Network (SNN) to 1.088B parameters from scratch. Ran out of budget, but here is what I found

Reddit r/LocalLLaMA

Implementation details of Backpropagation in Siamese networks. [D]

Reddit r/MachineLearning

Google Gemma 4 Review 2026: The Open Model That Runs Locally and Beats Closed APIs

Dev.to

QCFuse: Query-Centric Cache Fusion for Efficient RAG Inference

arXiv cs.AI

Act or Escalate? Evaluating Escalation Behavior in Automation with Language Models

arXiv cs.AI

FashionStylist: An Expert Knowledge-enhanced Multimodal Dataset for Fashion Understanding

Key Points

Abstract

Related Articles

I scaled a pure Spiking Neural Network (SNN) to 1.088B parameters from scratch. Ran out of budget, but here is what I found

Implementation details of Backpropagation in Siamese networks. [D]

Google Gemma 4 Review 2026: The Open Model That Runs Locally and Beats Closed APIs

QCFuse: Query-Centric Cache Fusion for Efficient RAG Inference

Act or Escalate? Evaluating Escalation Behavior in Automation with Language Models

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer