[D] Building a demand forecasting system for multi-location retail with no POS integration, architecture feedback wanted

Reddit r/MachineLearning / 3/27/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The post describes building a lightweight weekly demand-forecasting system for multi-location retail using only manually entered operational signals (e.g., revenue, covers, waste, mix, and simple contextual flags) with no POS or external data feeds.
  • The proposed architecture starts with a statistical baseline for the first 30 days (day-of-week decomposition plus trend) and then plans to introduce a light global model after day 30+, leveraging shared patterns across venues while predicting per entity.
  • It emphasizes pre-training outlier handling by flagging and excluding corrupted signal days before training, rather than attempting to correct them after the fact.
  • The author seeks feedback on three unresolved issues: whether global modeling beats local statistical models when each venue has low history (under 10 venues and under 90 days), how to handle outliers in sparse time series, and how to generate prediction confidence that non-technical operators can interpret as “high/low confidence.”
  • For confidence intervals, the author is considering conformal prediction or quantile regression and wants lightweight, calibrated methods suitable for short tabular time series.

We’re building a lightweight demand forecasting engine on top of manually entered operational data. No POS integration, no external feeds. Deliberately constrained by design.

The setup: operators log 4 to 5 signals daily (revenue, covers, waste, category mix, contextual flags like weather or local events). The engine outputs a weekly forward-looking directive. What to expect, what to prep, what to order. With a stated confidence level.

Current architecture thinking:

Days 1 to 30: statistical baseline only (day-of-week decomposition + trend). No ML.

Day 30+: light global model across entities (similar venues train together, predict individually)

Outlier flagging before training, not after. Corrupted signal days excluded from the model entirely.

Confidence scoring surfaced to the end user, not hidden.

Three specific questions:

  1. Global vs local model at small N With under 10 venues and under 90 days of history per venue, is a global model (train on all, predict per entity) actually better than fitting a local statistical model per venue? Intuition says global wins due to shared day-of-week patterns, but unclear at this data volume.
  2. Outlier handling in sparse series Best practice for flagging and excluding anomalous days before training, especially when you can’t distinguish a real demand spike from a data entry error without external validation. Do you model outliers explicitly or mask and interpolate?
  3. Confidence intervals that operators will trust Looking for a lightweight implementation that produces calibrated prediction intervals on short tabular time series. Considering conformal prediction or quantile regression. Open to alternatives.

Context: output is consumed by non-technical operators. Confidence needs to be interpretable as “high confidence” vs “low confidence”, not a probability distribution.

submitted by /u/Automation_storm
[link] [comments]
広告