Data Structure for 2-level multilevel VAR

Hi all,

I’m attempting to fit a VAR based on code found here, but adding another layer. Essentially, I have individual time series for regions which are nested within countries, and would like to pool on countries. I’m running into trouble setting up the data structure for this. The model code is:

  int N; //# observations
  int K; //# dimensions of Y
  int C; //# of countries
  int R; //# of regions
  int T; //# of time periods in the panel
  int<lower = 1, upper=C> country[R]; //country id for each region
  int<lower = 1, upper=T> time[N]; //time period id for each obs
  int<lower = 1, upper=R> region[N]; //region id for each obs

  matrix[T,K] Y[R]; //the outcome array - each variable's time series stacked by region

  //individual level
  vector<lower = 0>[K] tau[C]; //scale for residuals
  matrix[K, K] z_beta[C]; //untransformed betas 
  vector[K] z_alpha[C]; //untransformed intercepts
  //hierarchical parameters
  corr_matrix[K] Omega[C]; //country level correlation matrix
  vector[K] tau_loc; //mean for variance scaling factor
  vector<lower=0>[K] tau_scale; //scale for tau
  matrix[K, K] bhat_location; //mean for prior on beta
  matrix<lower = 0>[K, K] bhat_scale; //scale for prior on beta
  vector[K] ahat_location; //means for prior on intercepts
  vector<lower = 0>[K] ahat_scale; //variance for intercept prior

transformed parameters{
  matrix[K, K] beta[C]; //VAR(1) coefficients, country specific
  vector[K] alpha[C]; //country specific intercepts

  for(c in 1:C){
    //recentering random effects
    alpha[c] = ahat_location + ahat_scale .*z_alpha[c];
    beta[c] = bhat_location + bhat_scale*z_beta[c];

  tau_loc ~ cauchy(0,1);
  tau_scale ~ cauchy(0,1);
  ahat_location ~ normal(0,1);
  ahat_scale ~ cauchy(0, 1); 
  to_vector(bhat_location) ~ normal(0, 0.5);
  to_vector(bhat_scale) ~ cauchy(0, 0.5);

  //hierarchical priors
  for(c in 1:C){
    //non-centered paramaterization to avoid convergence issues
    z_alpha[c] ~ normal(0, 1);
    to_vector(z_beta[c]) ~ normal(0, 1);
    tau[c] ~ normal(tau_loc, tau_scale);
    Omega[c] ~ lkj_corr(5);
  for(r in 1:R){
    matrix[T, K] y_temp;
    y_temp = Y[1:T, 1:K, r];
    for(t in 1:T){
      if(t > 1){
        y_temp[t] ~ multi_normal(alpha[country[r]] + beta[country[r]]*Y[t-1]',
        quad_form_diag(Omega[country[r]], tau[country[r]]));

The problem I’m running into is with the sampling statement. The goal here is to split out the time series for each region, and sample use the coefficient for the country for each region. In my head, the easiest way to do this is to create a local temporary matrix that contains only the Y for each region (y_temp) above. However, I can’t figure out how to subset an array in that way - the subset that I attempt creates a vector, which can’t be assigned to a matrix.

This leads to two questions:

  1. How can I properly subset the array in order to run my sampling loop?
  2. Is there a better way to set-up the data structure (particularly with respect to model speed)?

Hey! Sorry for the delay. Questions like this are always a bit complicated, because it’s kind of heard to keep track of indexing, especially if the model is so complex.

I could be completely wrong, but since you declared your outcome like this

… I think Y[r] (without more indexes) will return the rth T by K matrix in the array. So this

should be matrix[T, K] y_temp = Y[r].

Again, I could be completely wrong on this one. :/

You can also put print() statements in your Stan program to see which shape your data, variables, parameters etc. have (best with one chain and a handful of iterations, otherwise you’ll be buried in numbers). If all of this doesn’t work, try to make it work for a simpler model and see if you can get the shape correctly.

Some more resources (you might already know) which could be useful:

Hope this helps.


Thanks for slogging through the model, that did it! On to making the model sample correctly and maybe eventually, in a reasonable amount of time.

1 Like