Deep Networks Favor Simple Data

arXiv cs.AI / 4/2/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • 研究は、深層モデルが「単純(低複雑度)」なOODサンプルに対して「典型的である」ような推定密度をより高く与える現象(OOD anomaly)を、モデルと密度推定器を分離して一般化可能な形で解析する枠組みを提示した。
  • 提案する密度推定器として、表現/出力に基づくJacobiansベース推定と、自己回帰的なself-estimatorsの2種類を導入し、iGPT・PixelCNN++・Glow・スコアベース拡散・DINOv2・I-JEPAなど幅広いモデルに同じ分析を適用できるようにした。
  • 実験では、推定密度の大小が「低複雑度ほど高密度、 高複雑度ほど低密度」という一貫した順序で現れ、テスト内でもCIFAR-10/SVHNのようなOOD組でも同様で、独立学習したモデル間でも高い再現性が確認された。
  • Spearman順位相関により、モデル間での一致だけでなく外部の複雑度指標との一致も示され、さらに最も低密度(最も複雑)なサンプルだけ、あるいは単一サンプルのみで学習してもなお「単純画像を高密度にランク付け」する傾向が残ると報告している。

Abstract

Estimated density is often interpreted as indicating how typical a sample is under a model. Yet deep models trained on one dataset can assign \emph{higher} density to simpler out-of-distribution (OOD) data than to in-distribution test data. We refer to this behavior as the OOD anomaly. Prior work typically studies this phenomenon within a single architecture, detector, or benchmark, implicitly assuming certain canonical densities. We instead separate the trained network from the density estimator built from its representations or outputs. We introduce two estimators: Jacobian-based estimators and autoregressive self-estimators, making density analysis applicable to a wide range of models. Applying this perspective to a range of models, including iGPT, PixelCNN++, Glow, score-based diffusion models, DINOv2, and I-JEPA, we find the same striking regularity that goes beyond the OOD anomaly: \textbf{lower-complexity samples receive higher estimated density, while higher-complexity samples receive lower estimated density}. This ordering appears within a test set and across OOD pairs such as CIFAR-10 and SVHN, and remains highly consistent across independently trained models. To quantify these orderings, we introduce Spearman rank correlation and find striking agreement both across models and with external complexity metrics. Even when trained only on the lowest-density (most complex) samples or \textbf{even a single such sample} the resulting models still rank simpler images as higher density. These observations lead us beyond the original OOD anomaly to a more general conclusion: deep networks consistently favor simple data. Our goal is not to close this question, but to define and visualize it more clearly. We broaden its empirical scope and show that it appears across architectures, objectives, and density estimators.