Is Sliding Window All You Need? An Open Framework for Long-Sequence Recommendation

arXiv cs.LG / 4/15/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The paper argues that long-sequence recommendation training is practical under real-world memory and latency constraints, contradicting the view that it is generally impractical.
It releases an end-to-end open framework (data processing, training, and evaluation scripts) implementing sliding-window long-sequence training with “industrial-style” rigor.
The authors add a runtime-aware ablation study to map the accuracy–compute tradeoff across window sizes and strides, enabling more informed configuration choices.
They propose a novel k-shift embedding layer designed to support million-scale vocabularies on commodity GPUs with minimal accuracy impact, especially in low-resource settings.
Experimental results show competitive retrieval quality (e.g., up to +6.04% MRR and +6.34% Recall@10 on Retailrocket) with about 4× training-time overhead, supported by reliable training on modest university clusters.

Abstract

Long interaction histories are central to modern recommender systems, yet training with long sequences is often dismissed as impractical under realistic memory and latency budgets. This work demonstrates that it is not only practical but also effective-at academic scale. We release a complete, end-to-end framework that implements industrial-style long-sequence training with sliding windows, including all data processing, training, and evaluation scripts. Beyond reproducing prior gains, we contribute two capabilities missing from earlier reports: (i) a runtime-aware ablation study that quantifies the accuracy-compute frontier across windowing regimes and strides, and (ii) a novel k-shift embedding layer that enables million-scale vocabularies on commodity GPUs with negligible accuracy loss. Our implementation trains reliably on modest university clusters while delivering competitive retrieval quality (e.g., up to +6.04% MRR and +6.34% Recall@10 on Retailrocket) with

\sim 4 \times

training-time overheads. By packaging a robust pipeline, reporting training time costs, and introducing an embedding mechanism tailored for low-resource settings, we transform long-sequence training from a closed, industrial technique into a practical, open, and extensible methodology for the community.