Perceptual misalignment of texture representations in convolutional neural networks

arXiv cs.CV / 4/3/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies whether CNN-based texture representations—specifically Gram matrices of convolutional features—match human perceptual texture content.
  • By evaluating many CNNs and comparing their texture-feature correlations with human perceptual alignment using Brain-Score, the authors find no relationship between standard CNN “visual system model” quality metrics and human-like texture representation.
  • The findings suggest that human texture perception relies on mechanisms different from those captured by common CNN object-recognition–trained approaches.
  • The authors hypothesize that contextual integration may play a key role in human texture perception that is not adequately reflected by current CNN feature-correlation texture models.

Abstract

Mathematical modeling of visual textures traces back to Julesz's intuition that texture perception in humans is based on local correlations between image features. An influential approach for texture analysis and generation generalizes this notion to linear correlations between the nonlinear features computed by convolutional neural networks (CNNs), compiled into Gram matrices. Given that CNNs are often used as models for the visual system, it is natural to ask whether such "texture representations" spontaneously align with the textures' perceptual content, and in particular whether those CNNs that are regarded as better models for the visual system also possess more human-like texture representations. Here we compare the perceptual content captured by feature correlations computed for a diverse pool of CNNs, and we compare it to the models' perceptual alignment with the mammalian visual system as measured by Brain-Score. Surprisingly, we find that there is no connection between conventional measures of CNN quality as a model of the visual system and its alignment with human texture perception. We conclude that texture perception involves mechanisms that are distinct from those that are commonly modeled using approaches based on CNNs trained on object recognition, possibly depending on the integration of contextual information.