Versioned Late Materialization for Ultra-Long Sequence Training in Recommendation Systems at Scale

arXiv cs.AI / 4/29/2026

💬 OpinionDeveloper Stack & InfrastructureIndustry & Market MovesModels & Research

共有:

Key Points

The paper argues that conventional “Fat Row” sequence pre-materialization in deep learning recommendation systems causes major storage and I/O bottlenecks as sequence length scales to ultra-long user histories.
It proposes a “versioned late materialization” approach that stores user interaction histories once in a normalized, immutable layer and reconstructs sequences on-the-fly during training using lightweight, versioned pointers.
The method includes an O2O (online-to-offline) consistency protocol designed to prevent future data leakage across both streaming and batch training workflows.
To maintain high throughput despite just-in-time reconstruction, the system uses read-optimized immutable storage, multi-dimensional projection pushdown for different model tenants, and pipelined I/O prefetching/data-affinity optimizations.
Deployed on production DLRMs, the approach reduces data infrastructure resource usage and enables aggressive sequence-length scaling that improves model quality, and supports architectures such as HSTU and ULTRA-HSTU.

Abstract

Modern Deep Learning Recommendation Models (DLRMs) follow scaling laws with sequence length, driving the frontier toward ultra-long User Interaction History (UIH). However, the industry-standard "Fat Row" paradigm, which pre-materializes these sequences into every training example, creates a storage and I/O wall where data infrastructure usage exceeds GPU training capacity due to data redundancy that is amplified in multi-tenant environments where models with vastly different sequence length requirements share a union dataset. We present a \emph{versioned late materialization} paradigm that eliminates this redundancy by storing UIH once in a normalized, immutable tier and reconstructing sequences just-in-time during training via lightweight versioned pointers. The system ensures Online-to-Offline (O2O) consistency through a bifurcated protocol that prevents future leakage across both streaming and batch training, while a read-optimized immutable storage layer provides multi-dimensional projection pushdown for heterogeneous model tenants. Disaggregated data preprocessing with pipelined I/O prefetching and data-affinity optimizations masks the latency of training-time sequence reconstruction, keeping training throughput compute-bound by GPUs. Deployed on production DLRMs, the system reduces training data infrastructure resource usage while enabling aggressive sequence length scaling that delivers significant model quality gains, serving as the foundational data infrastructure for modern recommendation model architectures, including HSTU and ULTRA-HSTU.

Black Hat USA

AI Business

What to Build Still Beats How

Dev.to

I Build Systems, Flip Land, and Drop Trap Music — Meet Tyler Moncrieff aka Father Dust

Dev.to

From Claim Denials to Smart Decisions: My Experience Using AI in Healthcare Claims Processing

Dev.to

Whatsapp AI booking system in one prompt in 5 minutes

Dev.to

Versioned Late Materialization for Ultra-Long Sequence Training in Recommendation Systems at Scale

Key Points

Abstract

Related Articles

Black Hat USA

What to Build Still Beats How

I Build Systems, Flip Land, and Drop Trap Music — Meet Tyler Moncrieff aka Father Dust

From Claim Denials to Smart Decisions: My Experience Using AI in Healthcare Claims Processing

Whatsapp AI booking system in one prompt in 5 minutes

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer