AI Navigate

[D] Single-artist longitudinal fine art dataset spanning 5 decades now on Hugging Face — potential applications in style evolution, figure representation, and ethical training data

Reddit r/MachineLearning / 3/22/2026

📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • A single artist published a longitudinal fine art dataset on Hugging Face, covering 3,000-4,000 images with more to come and spanning five decades of the human figure across multiple media.
  • It includes full structured metadata (catalog number, title, year, medium, dimensions, collection, view type) and source materials, licensed under CC-BY-NC-4.0.
  • The longitudinal nature enables computational study of stylistic drift and cross-domain style analysis, with potential for representation learning research.
  • Being published directly by the artist with provenance and licensing informs ongoing discussions about ethical training data sourcing.
  • The dataset already attracted over 2,500 downloads in its first week on Hugging Face, signaling strong early interest.

I am a figurative artist based in New York with work in the collections of the Metropolitan Museum of Art, MoMA, SFMOMA, and the British Museum. I recently published my catalog raisonne as an open dataset on Hugging Face.

Dataset overview:

  • 3,000 to 4,000 images currently, with approximately double that to be added as scanning continues
  • Single artist, single primary subject: the human figure across five decades
  • Media spans oil on canvas, works on paper, drawings, etchings, lithographs, and digital works
  • Full structured metadata: catalog number, title, year, medium, dimensions, collection, view type
  • Source material: 4x5 large format transparencies, medium format slides, high resolution photography
  • License: CC-BY-NC-4.0

Why it might be interesting for deep learning research:

The longitudinal nature of the dataset is unusual. Five decades of work by a single artist on a consistent subject creates a rare opportunity to study stylistic drift and evolution computationally. The human figure as a sustained subject across radically different periods and media also offers interesting ground for representation learning and cross-domain style analysis.

The dataset is also one of the few fine art image datasets published directly by the artist with full provenance and proper licensing, which makes it relevant to ongoing conversations about ethical training data sourcing.

It has had over 2,500 downloads in its first week on Hugging Face.

I am not a researcher or developer. I am the artist. I am interested in connecting with anyone using it or considering it for research.

Dataset: huggingface.co/datasets/Hafftka/michael-hafftka-catalog-raisonne

submitted by /u/hafftka
[link] [comments]