Backcasting forecast errors: model collapsing to mean [P]

Reddit r/MachineLearning / 5/9/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The user is building a time-series backcasting model to predict the error of weather-like forecasts issued at dates D for horizons 1–14 (i.e., forecast error = forecast − actual measured at the forecast date).
They construct training rows where dates/actuals/normals are duplicated across 14 horizons, and they include seasonality/climatology “normals” plus engineered features like anomalies and lagged/rolling statistics.
After transforming the target to remove annual seasonality and long-term trend, they train a RandomForest with tuned depth/leaf/feature settings, using reversed folds to emulate backcasting and avoid leakage.
Despite extensive feature engineering and tuning, their predictions are too “shallow,” collapsing toward the mean (overly shrinking dispersion such as std and quantiles), performing roughly like a naive all-zeros predictor, with MAE worsening at longer horizons.
They are asking for ideas and approaches to investigate why the model captures mostly noise and to improve forecast-error calibration across horizons.

Hey everyone,

I am kind of desperate for help right now on my current project. I'll try and be as clear as possible.

I'm working on a time series backcasting problem. The values I want to backcast are forecasts (not ML forecast, but think of weather forecasts) at different horizon (from 1 to 14). So to be clear, at a date D, I have 14 forecasts (forecast at D+1,..., D+14). I have such forecasts from 2020 to 2026 (each row represents a day, each (date, horizon) key is unique). So I have 14 dates duplicated as blocks because each row consists of on unique(date, horizon) -> target_date. I hope this is clear enough.

So the goal is to backcast those forecasts before 2020 (say 2019-2020 for simplicity). Besides forecasts values and horizon columns, I have "actuals" that are the true measured values for a particular variable (say temperature), and "normals" which is a smooth curves representing the climatology norm for a particular data. This "normals" column captures the seasonality, trend, and every other repetitive and predictable patterns.

So to be clear I have :

* dates (of forecast emission) | actuals | normals | horizon | forecasts *

And to really emphasise this point : dates, actuals and normals are the same for 14 consecutive rows (One row equals one horizon).

The target I want to predict is the following : forecast - actual_at_forecast_date

So i want to predict the true error observed (say i had predicted 20 (forecast) for today and I measure 18 (actual) then my target is +2).

So far, I've done the following :

- Transform target to remove annual seasonality, long-term trend and level-scaling

- Engineered classic features such as anomaly (actual-normal), lagged anomalies, rolling stats (std, mean, median, quantiles)

- Engineered target encoding features such as target_encoding_horizon_x_month

- RandomForest with max_depth 10-15, min_leaf 10, max features "sqrt", n_estimators 300

My train/val folds are reversed because I wanted to best evaluate on a backcasting framework. I made sure there is no leakage.

FINALLY:

My main problem is that, even with a LOT of features combination, trying a LOT of tuning, my prediction is very shallow and shrinking to the mean (the std and q10, q90 are off by a lot). So given I try to predict forecast_error which is centered on 0, I start to think that I only capture noise because my predictions really won't fit anything. MAE is getting worse with higher horizon forecasts which is only natural but even for horizon 1 my prediction is as good as predicting only 0s MAE-wised. Please if anyone has ideas that I can explore on my own I would be so grateful. I know you don't have all the details here but if you have experience with backcasting and has some recommendations I would be so grateful.

Hey everyone,

I'm working on a time series backcasting problem and I'm running into a fairly stubborn issue. I'd really appreciate any insights from people who have worked on similar setups.

Problem setup

I have daily-issued forecasts with multiple horizons:

At each date D, I have forecasts for D+1, ..., D+14
Data spans 2020–2026
Each row is a unique (forecast_date, horizon) pair

Toy example:

forecast_date	horizon	target_date	forecast	actual	normal
2023-01-01	1	2023-01-02	20	18	19
2023-01-01	2	2023-01-03	21	20	19
...	...	...	...	...	...
2023-01-01	14	2023-01-15	25	23	20

Important:

forecast_date, actual, and normal are identical across the 14 horizons
Only horizon, target_date, and forecast vary

Objective

I want to backcast forecast errors before 2020.

Target:

target = forecast − actual(target_date)

So if forecast = 20 and actual = 18 → target = +2.

Features

forecast, horizon
actual, normal
anomaly = actual − normal
lagged anomalies
rolling stats (mean, std, quantiles)
target encoding (e.g. horizon × month)

Model

Random Forest:

max_depth: 10–15
min_samples_leaf: 10
max_features: sqrt
n_estimators: 300

Validation

Time-based splits adapted for backcasting
No leakage (checked carefully)

Main issue

Predictions are very shallow and collapse toward 0:

Very low variance
Poor estimation of tails (q10 / q90)
Even for horizon = 1, performance is close to predicting constant 0 (in MAE)

MAE increases with horizon (expected), but overall performance remains weak.

Diagnostics

std(predictions) / std(target) ≈ 0.4 at best
This ratio decreases with horizon

So the model is clearly under-dispersed.

Interpretation

At this point I suspect:

either the signal is very weak
or the model is too conservative and fails to capture amplitude

Any help, feedback, or ideas to explore would be greatly appreciated.

Thanks a lot.

submitted by /u/Ambitious-Log-5255
[link] [comments]

How I Combine AI + Automation + Full-Stack Development to Build Smarter Systems

Dev.to

Sign Once, Let the Agent Run: Why FluxA Looks Built for the Next Wave of AI Commerce

Dev.to

Nine Seconds, No Backups: An Agent’s “Confession”

Dev.to

DeepSeek V4 Pro vs Flash: 3 Tasks, 100M Tokens, Real Cost-Quality Tradeoff

Dev.to

Vector Database Là Gì? Giải Mã "Trái Tim" Của Kỷ Nguyên AI

Dev.to

Backcasting forecast errors: model collapsing to mean [P]

Key Points

Problem setup

Objective

Features

Model

Validation

Main issue

Diagnostics

Interpretation

Related Articles

How I Combine AI + Automation + Full-Stack Development to Build Smarter Systems

Sign Once, Let the Agent Run: Why FluxA Looks Built for the Next Wave of AI Commerce

Nine Seconds, No Backups: An Agent’s “Confession”

DeepSeek V4 Pro vs Flash: 3 Tasks, 100M Tokens, Real Cost-Quality Tradeoff

Vector Database Là Gì? Giải Mã "Trái Tim" Của Kỷ Nguyên AI

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer