Hierarchical AR(K) Models

I have various time-series of different lengths (usually called panel data). I want to run AR(K=3) regression on each of them but some of them are too short for a meaningful regression so I decided to use a Bayesian Hierarchical model in which the 3 multipliers for each series are sampled from a common multivariate normal distribution, so far so good, the model fits just fine and there are no divergent transitions and all Rhat values are extremely close to 1.

My question is: my Stan code effectively doesn’t model data (and parameters) for any time-series that has less than 4 observations, is that the right thing to do? I think I have no choice but to skip the series with just 1 observation but what about series with 2 or 3 observations?

Below is my Stan code:

data {
  int<lower=1> N; // total number of observations
  int<lower=1> K; // number of lagged factor
  int<lower=1> numG; // number of groups
  vector[N] y; // data for Hierarchical AR(K)
  int sizes[numG]; // sizes of observations across groups
parameters {
  real<lower = 0> local_sigma; 
  vector<lower = 0>[K] global_sigma;
  vector[K] z[numG];
  vector<lower = -2, upper = 2>[K] global_beta; // global AR(K) multipliers
transformed parameters {
  vector[K] group_betas[numG]; // group level AR(K) multipliers
  for(i in 1:numG) {
    group_betas[i] = global_beta + global_sigma .* z[i]; // NCP formulation, no correlations
model {
  int pos;
  pos = 1;
  for(i in 1:numG) {  // loop over groups
    int local_N;
    vector[sizes[i]] local_y;
    local_N = sizes[i];
    local_y = segment(y, pos, local_N);
    for (n in (K + 1):local_N) { // loop over observations
      real mu;
      mu = 0;
      for (k in 1:K) { // loop over lags
        mu = mu + group_betas[i][k] * local_y[n - k];
      y[n] ~ normal(mu, local_sigma);
    z[i] ~ normal(0, 1);
    pos = pos + local_N;
  // hyperpriors
  local_sigma ~ cauchy(0, 2);
  global_sigma ~ cauchy(0, 2);
  global_beta ~ cauchy(0, 2);

Reposting my reply in Google Groups:

It is not unusual to employ separate models for the first few observations
in a time series, e.g.

 p(y) = p(y[1])
        * p(y[2] | y[1])
        * p(y[3] | y[1:2])
        * PROD_{n > 3} p(y[n] | y[n-3:n-1]) ...

You just need to formulate models for the first three terms.

Thanks Bob, that makes sense.

Another option is to declare the previous values as missing values (see the missing data section of the manual - you need to mix missing values and data) and apply the one AR(3) model to all of them.

So here you would need y[0], y[-1] and y[-2] declared as unknowns. (You obviously need to add 3 to all of your indices so they are declared to be positive).

Then you’ll get the posterior of the unobserved values given your model.


@JulianK: That sounds like an interesting approach, but it seems to beg the question of what priors to give the missing values given that the AR component of the model only models y[n] conditioned on y[{n-3, n-2, n-1}].

I’d code these special edge cases directly rather than following the manual’s missing data advice, as it’s expensive to convert all the observations y to variables that get differentiated.

Is it typical to model these as missing data in Bayesian time series model? If so, I should add a discussion to the manual chapter.

I beg to differ, there is no missing data, for the shorter series the data simply doesn’t exist. We don’t want to model childhood data for babies as missing. I think it is better to have separate model for shorter series.

1 Like

Hi Bob,

I have seen examples of treating the pre-series terms as missing in order to have a consistent AR(k) model for the whole series, and to avoid having a separate model for the first few terms.

Specifically, I have seen advice when using Gibbs sampling to simulate the process backwards using the inverse relationship, then run it forwards again with the normal Gibbs sampling. Or you can just use Gibbs sampling including the missing values, noting that there is no data constraint, so you end up with a “tail” that wiggles around on the series. If there are more than a few terms then the mixing for these points is pretty slow with Gibbs for the usual reasons around slow mixing for Gibbs sampling in time series.

In particular, P(y_1 | y_2, y_3, y_4) is proportional to P(y_4 | y_1, y_2, y_3) x P(y_1). As you note, you need a prior for the missing parameters, but if the series is stationary then improper flat priors usually pose no issues.

Here’s a reference for you for this technique: http://as.wiley.com/WileyCDA/WileyTitle/productCd-0471920835.html

Another option is to simulate the missing values from the unconditional long-term mean and variance.


@Vishv_Jeet yes whether you can do this is context dependent and whether it makes sense. For financial time series the series usually exists before the start point but you choose not to include it in the model because you want a finite history.

Any references that aren’t embedded in $60 ebooks.

Stan requires a joint density, not a Gibbs sampling order, but that’s not a big deal here.

If you have improper flat priors, does that change the posterior distribution of the other variables? Sorry, I’m just too lazy to work it all out and I figured someone already knew the answer.

I thought most financial data had a non-stationary trend.

@Bob_Carpenter do you know of any examples that show this (AR models with separate models for the first n-1 observations) in action? I’m struggling a bit as to what this model would look like in Stan code.

Thanks for sharing this Vishv, I’ve found your example a really useful reference from which to build my AR panel model.

I’m finding it hard to understand some aspects of your code, and I was wondering if you were able to provide some clarification.

Right now, I’m struggling to understand the line:

  y[n] ~ normal(mu, local_sigma);

It seems like n will only ever vary between K + 1 and the size of your local group (local_N). So it will never be a number larger than the largest size of any group. Where as y, your observations, is of length N (your total number of observations). Won’t this mean you won’t iterate over every possible observation there is to model?

I would have expected something like:

  y[pos + n - 1] ~ normal(mu, local_sigma);

I feel your understanding of Stan is a few levels above mine so I expect I am missing something here.

Hi Willte,

That is a really old thread!!

I think there is a typo there. It should have been:
local_y[n] ~ normal(mu, local_sigma)

I believe you can also work with y but I have to think about the index a little bit more. Perhaps yours is right.

1 Like

I know right! But I can’t find any other good examples out there for AR panel data in Stan.

Thanks for your input either way.

Do you have your final code for the project? The one that also modeled the first few observations within the time-series?

I will have to look.

1 Like