Annual time series data with external regressor

I am trying to fit a model to annual (time series) data with a time-dependent external regressor.

In my simple/simplistic scenario (no autoregressive terms, no moving-average terms, no seasonality), changes in response y are exclusively driven by changes in the temperature T, which in turn is expected to have some linear dependence on time t. My goal is to fit data to the model, and then predict y into the future based on future predictions for T.

In the context of ARIMA models, this would correspond to an ARIMA(0, 0, 0) model, with the single external regressor T.

I had expressed the model in the following form:

\begin{aligned} y &\sim N(\mu_y, \sigma_y) \\ T &\sim N(\mu_T, \sigma_T) \\ \mu_y &= \alpha + \mu_T \\ \mu_T &= \beta_0 + \beta_1 t \end{aligned}

The likelihoods for y and T are coupled through \mu_t = \alpha + \mu_T.

Translating this into Stan and fitting the model results in errors of the form

Chain 1: Rejecting initial value:
Chain 1: Error evaluating the log probability at the initial value.
Chain 1: Exception: normal_lpdf: Location parameter[1] is nan, but must be finite! (in ‘model90841be2e428_8f83da75be386a1006ae4f72518f8aef’ at line 37)

I am not sure I understand what’s going on. I had hoped to fit simultaneously T to the temperature data and y to the response data. In terms of forecasting, sampling from T for future time points and using these samples to draw samples for y would allow me to carry through uncertainties in the regressor T to samples of y.

Would somebody be able to point out where I’m going wrong?


Reproducible R code example:

model_code <- "
data {
  int<lower=1> N;                   // Number of time points
  vector[N] t;                      // Time
  vector[N] y;                      // Response
  vector[N] T;                      // Temperature
}

parameters {
  real<lower=0> sigma_y;
  real<lower=0> sigma_T;
  real alpha;
  real beta0;
  real beta1;
}

transformed parameters {
  vector[N] mu_y;
  vector[N] mu_T;

  mu_y = alpha + mu_T;
  mu_T = beta0 + beta1 * t;

}

model {

  // Priors
  sigma_y ~ cauchy(0, 2.5);
  sigma_T ~ cauchy(0, 2.5);
  alpha ~ normal(mean(y), 10);
  beta0 ~ std_normal();
  beta1 ~ normal(0, 10);

  // Likelihood
  T ~ normal(mu_T, sigma_T);
  y ~ normal(mu_y, sigma_y);

}
"

data <- list(
    N = 13L,
    t = c(
        2007, 2008, 2009, 2010, 2011, 2012, 2013,
        2014, 2015, 2016, 2017, 2018, 2019),
    y = c(
        49.6417910447761, 46.2388059701492, 55.1940298507463, 54.4776119402985, 55.910447761194, 45.3432835820895, 52.5074626865672, 55.1940298507463, 44.6268656716418, 52.865671641791, 55.910447761194, 51.7910447761194, 59.1343283582089),
    T = c(
        39.3, 34.7, 40.1, 39.2, 37.55, 34.8, 40.85,
        40.1, 36.8, 39, 41.35, 40, 41.2))


fit <- stan(model_code = model_code, data = data, chain = 1)

The init values default in Stan to between -2 and +2. So it’s say those aren’t working for the model you specified. You can set you init values to something that won’t cause your log probability to go all sideways.

Apologies for not being clear.

I don’t understand why there would be issues with the init values. The normal likelihoods have infinite support, and there are weakly informative priors on the means, so I don’t see how the location parameter (i.e. the mean) can be NaN.

1 Like

Stan executes the statements in the order they are written. You need to define mu_T before using it.

  mu_T = beta0 + beta1 * t;
  mu_y = alpha + mu_T;

Ah you’ve saved the day @nhuurre. Many thanks. So obvious now…

There is still something very odd going on. The fit to the data is very poor; I had thought/hoped that the linear dependence of T on t would be captured through the explicit likelihood for T. That does not seem to be the case though. In fact, I’ve now confused myself about the effect of including or not including the likelihood for T. More explicitly, consider the difference between these two models:

Model 1

// data and parameters blocks as above
transformed parameters {
  vector[N] mu_y;
  vector[N] mu_T;
  mu_T = beta0 + beta1 * t;
  mu_y = alpha + mu_T;
}

model {
  // Likelihood
  T ~ normal(mu_T, sigma_T);
  y ~ normal(mu_y, sigma_y);
}

Model 2

// data and parameters blocks as above
transformed parameters {
  vector[N] mu_y;
  mu_y = alpha + beta0 + beta1 * t;
}

model {
  // Likelihood
  y ~ normal(mu_y, sigma_y);
}

The second model is just a simple linear model y \sim N(\mu_y, \sigma_y) with \mu_y = \alpha + \beta_0 + \beta_1 t. I had thought that the first model allows me to characterise the temperature T dependence on time t first, which would in turn allow me to use the T-t-dependence to characterise the dependence of the response y on the temperature T. But this doesn’t seem to be the correct model to do that.

The linear model has a “year zero problem”: it assumes the temperature was around zero degrees back in the year zero and has been increasing linearly ever since. Now, two thousand years later, the measured temperature is around 40 degrees so the rate of the increase must be quite small.
A properly normalized model (e.g. about 40 degrees in 2015) gives a more reasonable fit.

transformed parameters {
  vector[N] mu_y;
  vector[N] mu_T;
  mu_T = 40 + beta0 + beta1 * (t-2015);
  mu_y = alpha + mu_T;
}

Thanks again @nhuurre, that was it. Appreciate your help.