Hi there!

I’m not-so-new to Bayesian modeling but fairly new to time series modeling. I’m writing here for advice on using the credible intervals of a time series model as a measure of model uncertainty (the main motivation for using a Bayesian model for our current project). The goal of our project is to use historical data to forecast values for the coming week – and also to report the uncertainty of these predictions.

#### Data (time series)

y_{obs}: daily sales of a retail store for (almost) every day from Jan 2nd, 2017 onwards; with weekly and yearly seasonality and a holiday effect.

Some data points are missing (see model).

Data are standardized before given to the model (but rescaled in the analysis plots below).

#### Model

I constructed the following state-space model drawing from many different resources (e.g., [1], [2], [2a], [2b], [3]):

```
data {
int<lower=1> n_obs; // number of observed values
int<lower=0> n_miss; // number of missing values
int<lower=0> n_predict; // number of values to predict
int<lower = 1, upper = n_obs + n_miss> indices_obs[n_obs]; // indices of observed values
int<lower = 1, upper = n_obs + n_miss> indices_miss[n_miss]; // indices of missing values
real y_obs[n_obs]; // observed values
vector[n_obs + n_miss + n_predict] holiday; // dummy coded holiday vector (0 or 1)
}
transformed data {
int<lower=0> n = n_obs + n_miss; // length of historical data (incl. missing values)
int<lower=n> predict_start; // start of prediction period
int<lower=n> predict_end; // end of prediction period
int<lower=1, upper=365> yday[(n + n_predict)]; // day-of-the-year feature vector
int<lower=1, upper=7> wday[(n + n_predict)]; // day-of-the-week feature vector
// fill seasonality feature vectors (1-365, 1-7)
for (i in 1:(n + n_predict)) {
yday[i] = i % 365 + 1;
}
for (i in 1:(n + n_predict)) {
wday[i] = i % 7 + 1;
}
predict_start = n + 1; // index (aka timepoint) at which to start the prediction
predict_end = n + n_predict; // index to end the prediction
}
parameters {
real x0; // first value
real drift; // drift
real<lower=-1, upper=1> phi; // controls to what extent the random walk reverts to the mean, see https://nwfsc-timeseries.github.io/atsa-labs/sec-stan-ar1.html
vector[predict_end - 1] state_error; // noise of state
real<lower=0> sigma_state; // standard deviation for state noise
real<lower=0> sigma_irreg; // standard deviation for irregular component
real y_miss[n_miss]; // vector of missing values
vector[6] seas_w; // weekly seasonality
vector[364] seas_y; // yearly seasonality
real h; // holiday parameter
}
transformed parameters {
vector[predict_end] estimation; // estimated outcome
vector[predict_end] state; // latent state of the time series model
// zero-mean seasonal terms
vector[7] seasonal_week = append_row(seas_w, -sum(seas_w));
vector[365] seasonal_year = append_row(seas_y, -sum(seas_y));
state[1] = x0; // first state
for(i in 2:predict_end) {
state[i] = phi * (state[i-1] - drift) + drift + sigma_state * state_error[i-1];
}
// the central model equation:
// the estimated outcome is the state + yearly/weekly seasonality + the holiday feature (only relevant if holiday != 0)
estimation = state + seasonal_week[wday] + seasonal_year[yday] + h * holiday;
}
model {
// create vector that contains observed and missing values
real y[n];
y[indices_obs] = y_obs;
y[indices_miss] = y_miss;
// missing values are assumed to be normally distributed
y_miss ~ normal(0, 1);
// prior distributions of parameters
x0 ~ normal(0, 1);
drift ~ normal(0, 1);
h ~ normal(0, 1);
sigma_state ~ exponential(2);
state_error ~ normal(0, 1);
sigma_irreg ~ exponential(2);
// likelihood
y ~ normal(estimation[1:n], sigma_irreg);
}
generated quantities {
vector[n] yhat; // posterior predictive to check whether the model has fitted the data
vector[n_predict] prediction; // predicted values
for(i in 1:n) {
yhat[i] = normal_rng(estimation[i], sigma_irreg);
}
for (i in predict_start:predict_end) {
// sample with a random generator, cf. with the likelihood
prediction[i - n] = normal_rng(estimation[i], sigma_irreg);
}
}
```

I’m using `pystan`

to compile this model and to hand over the data to the model:

```
model = pystan.StanModel(file = 'model.stan')
data = {'n_obs': number_observations,
'n_miss': len(indices_missing),
'indices_obs': indices_observations,
'indices_miss': indices_missing,
'holiday': pd.concat([df.holiday, df_prediction.holiday]),
'n_predict': days_to_predict,
'y_obs': df[df.yobs.notnull()].yobs
}
fit = model.sampling(data = data, verbose = True)
```

#### Model fit

In general, the model captures the data quite well. To test the model on unknown data (and to simulate the actual production environment which requires to forecast the coming week), we have the the following training/testing pipeline:

- fit the model on data from January 2017 - December 2018
- predict first week of January 2019
- add first week of January 2019 to training set and re-fit the model
- predict second week of January 2019
- add second week of January 2019 to training set and re-fit the model
- predict third week of January 2019
- etc. until (almost) the end of 2019

This results in a year of predictions that we can compare to actually known data (but unknown to the model).

Here is a plot of actual data (unknown to the model) and the model’s mean outcome (i.e., the means of the `prediction`

vector), including credible intervals as computed with `arviz`

(`arviz.hpd(prediction_df.values, credible_interval=0.95)`

).

Although this fit looks quite okay, the prediction error is still too high in my opinion. Here is a corresponding plot of the absolute prediction error:

#### Problem

Now, here’s the problem I need your help with:

We would like to use the model

- to predict/forecast the next week and
- to deliver an uncertainty about these predictions.

I thought that the (width of the) credible intervals of the `prediction`

vector contains the uncertainty of the model, following this logic: A larger error should come with a larger model uncertainty, indicated by larger width of the credible intervals.

It turns out that this is not the case. Here is the same error plot from above together with the width of the CIs:

The CI width is relatively large in comparison to the error, however it shows way less variation. Most importantly, the varation in CI widths does not correlate with the magnitude of the error.

In an attempt to identify a potential time-dependent systemacity between error and CI widths, here are the 10 largest errors vs. CI widths:

These plots do look the same for ca. 50 similar datasets. I couldn’t find a dependency between CI widths and model errors (as conformed by close to zero correlation between these two variables).

So, I’m stuck here. My ideas would be to reduce the overall model error (as they indicate substantial trends in the data not captured by the model) by improving the model in general (e.g., adding a slope component, rethinking the 365-day yearly seasonality, …). I hope to see a better correlation of model error and CI width with a better fitting model. Another idea is to add a Kalman filter to the model, because I read this somewhere. However, I actually didn’t yet get the general idea of a Kalman filter to be able to tell if this is a meaningful idea at all …

Or am I actually missing a different point here? Potentially my model is misspecified somehow? Am I computing the wrong intervals? Is it actually possible to use the CIs as an indicator of prediction uncertainty? Any help is appreciated!

Thanks,

kartoffelsalat