Reduce_sum with hierarchical vector auto-regression

I’m trying to use reduce_sum with a multilevel VAR, and I’m having trouble trying to figure out how to wrangle my data structure to work with reduce_sum (or if it’s even possible).

My units are regions, nested within countries. The model partially pools at the country level (country-level random intercepts and slopes). The data are structured as a 11x5x120 dimension array, where each 11x5 matrix is a region.

I’m running into trouble on how to define the slice, given that it needs to slice an array and each region’s time series must be kept separate (for the lags to make sense). (This is my first time using reduce_sum, so I may also be misunderstanding something about how this works).

This is the code I have so far:

functions{
  real partial_sum(real[] slice_y,
                   int start, int end,
                   matrix beta,
                   matrix alpha,
                   matrix Omega,
                   vector tau,
                   int t,
                   int country,
                   int R,
                   int K
                   )
  
    for(r in 1:R){
     matrix[t, K] y_temp = to_matrix(slice_y[:, :, r]);
    for(tt in 1:t){
      vector[K] lagvars; //lagged variables
     //linear predictor
      if(tt > 1){
      	lagvars = to_vector(y_temp[tt-1,:]);
      	real mu[K]; //linear predictor
        mu = alpha[country[r]] + beta[country[r]]*lagvars;
        return multi_normal_lpdf(y[t]|mu,quad_form_diag(Omega[country[r]], tau[country[r]]))
      }
    }
  }

}

data{
  
  int N; //# observations
  int K; //# dimensions of Y
  int C; //# of countries
  int R; //# of regions
  int t; //# of time periods in the panel
  
  int<lower = 1, upper=C> country[R]; //country id for each region
  int<lower = 1, upper=t> time[N]; //time period id for each obs
  int<lower = 1, upper=R> region[N]; //region id for each obs

  real Y[t, K, R]; //the outcome array - each variable's time series stacked by region
  
}

parameters{
  
  
  //individual level
  vector<lower = 0>[K] tau[C]; //scale for residuals
  matrix[K, K] z_beta[C]; //untransformed betas 
  vector[K] z_alpha[C]; //untransformed intercepts
  
  //hierarchical parameters
  corr_matrix[K] Omega[C]; //country level correlation matrix
  vector[K] tau_loc; //mean for variance scaling factor
  vector<lower=0>[K] tau_scale; //scale for tau
  matrix[K, K] bhat_location; //mean for prior on beta
  matrix<lower = 0>[K, K] bhat_scale; //scale for prior on beta
  vector[K] ahat_location; //means for prior on intercepts
  vector<lower = 0>[K] ahat_scale; //variance for intercept prior
  
  
}

transformed parameters{
  
  matrix[K, K] beta[C]; //VAR(1) coefficients, country specific
  vector[K] alpha[C]; //country specific intercepts

  
  for(c in 1:C){
    //recentering random effects
    alpha[c] = ahat_location + ahat_scale .*z_alpha[c];
    beta[c] = bhat_location + bhat_scale*z_beta[c];
  }
 
}

model{
  
  //hyperpriors
  tau_loc ~ cauchy(0,1);
  tau_scale ~ cauchy(0,1);
  ahat_location ~ normal(0,1);
  ahat_scale ~ cauchy(0, 1); 
  to_vector(bhat_location) ~ normal(0, 0.5);
  to_vector(bhat_scale) ~ cauchy(0, 0.5);

  
  //hierarchical priors
  for(c in 1:C){
    //non-centered paramaterization to avoid convergence issues
    z_alpha[c] ~ normal(0, 1);
    to_vector(z_beta[c]) ~ normal(0, 1);
    tau[c] ~ normal(tau_loc, tau_scale);
    Omega[c] ~ lkj_corr(5);
  }
  
  
  //reduce sum likelihood
  target += reduce_sum(partial_sum, Y, grainsize, beta, alpha,
                        Omega, tau, t, country, R, K)
  
}

Where I run into trouble is the matrix[t, K] y_temp = to_matrix(slice_y[:, :, r]) line. In the original model, this was designed to subset a particular region out of the array. I believe this is where the start and end integers need to go, but I’m not sure how to incorporate that while still respecting the multi-level nature of the data.

I’ve looked at this post, but the nature and structure of their model is different enough from mine (and the code is complex enough) that I haven’t been able to find a clear way to alter it to work with my model.

How can I get this model running with sum_reduce, if possible?

Thanks in advance for any help!

1 Like

Can’t you slice by region?

Okay, I think maybe I’m misunderstanding how the slicing works. So would something like this enable me to slice on region? (Some variables that need to be dropped from there in the new setup, but in general.)

functions{
  real partial_sum(int[] r_slice,
                   int start, int end,
                   matrix Y,
                   matrix beta,
                   matrix alpha,
                   matrix Omega,
                   vector tau,
                   int t,
                   int country,
                   int R,
                   int K
                   )
    for(tt in 1:t){
        matrix[t, K] y_temp = to_matrix(Y[r_slice]);
        vector[K] lagvars; //lagged variables
        real mu[K]; //linear predictor
        if(tt > 1){
      	  lagvars = to_vector(y_temp[tt-1,:]);
          mu[t] = alpha[country[t]] + beta[country[t]]*lagvars;
          return multi_normal_lpdf(y[t]|mu,quad_form_diag(Omega[country[t]], tau[country[t]]))
      }
    }
  }

}

Slicing works in any way as you define it.

This does not look like it will work

matrix[t, K] y_temp = to_matrix(Y[r_slice]);

This will select from Y as many rows as the slice has, but „t“ is some fixed integer, how is that supposed to fit together?

Can you write your existing model as a for loop over regions?

Every region has t observations on k variables, so if the slice is the entire region, that matrix should always be t*K. I was understanding it, slicing by the region indicator would produce each region as a slice?

I originally did loop over regions in the model block, but that code was extremely slow. The problem is that each region is an independent time series, so both the region and the sequence of the data matters - i.e. a slice has to maintain an intact time series for each region. Maybe reduce_sum just isn’t ideal for this data setup?

No, if you can write the loop over the regions and each region is independent of the other regions, then that sounds like a good slicing unit to me.

You should probably start to write this such that you write a function which calculates the per region log-lik. Then you write the model as a loop which just calls for each region this function. Once that works, you can start to use reduce_sum where you just loop over region subsets (by using an index of 1:region which you slice over).

Makes sense?

2 Likes

Yep - thanks for the help!

Sorry, clarification question. When you say slice over an index of 1:region, do you mean that the slice argument to partial_sum() should be the region index or I should feed partial_sum already subset data? I think I’m still struggling with understanding how I control the slicing.

Right now, my partial sum function looks like this:

functions{
    real partial_sum(matrix y_slice, int start, int end, int K, vector r_time, matrix beta_r, vector alpha_r,
    matrix Omega_r, vector tau_r, int t) {
   
  matrix[K, t] mu; //linear predictor
  
    
  for(tt in 1:t){
    vector[K] lagvars;
    if(tt > 1){
      lagvars = to_vector(y_slice[tt-1,:]);
      mu[:,tt] = alpha_r + beta_r*lagvars;
    }
  }
  for(tt in 2:t){
  return multi_normal_lpdf(to_vector(y_slice[tt]) | mu[:,tt], quad_form_diag(Omega_r, tau_r));
  }
  
}
                 
}

And the relevant part of the model block looks like this, where I subset the data for each region and feed it to the function:

 for(r in 1:R){
   //define variables to give to function
   int r_country;
   matrix[t, K] y_func;
   r_country = country[r];
   y_func = to_matrix(Y[:,:,r]);
   
   target += reduce_sum(partial_sum, y_func, grainsize, //y for specific region
   K, //dimensions of output
   to_vector(time[:,r]), //time index for region
   to_matrix(beta[r_country]), // relevant VAR coefficients
   to_vector(alpha[r_country]), //relevant VAR inrecepts
   to_matrix(Omega[:][r_country]), //relevant country-level correlation matrix 
   tau[:][r_country], //relevant country-level scale 
   t); //total number of time periods
   
  }

Should the slicing variable instead be the index that defines the loop (R, which runs from 1:120)?

(I’m guessing I’m on the wrong track, since the existing code throws an ill-typed arguments error Instead supplied arguments of incompatible type: (matrix, int, int, int, vector, matrix, vector, matrix, vector, int) => real, matrix, int, int, vector, matrix, vector, matrix, vector, int )

Thanks for the help so far! I feel like this is one of those things where I’m missing something very obvious…

the first argument of the partial-sum function must be an array of whatever is being sliced.

To make progress in this case i suggest you to

  1. Do NOT use reduce_sum to start!
  2. Define two functions: a function for summing a region - call it region_lpdf, for example
  3. define a “partial_sum” function which you pass in the entire data and all parameters, but it does the slicing of the large data set and passes it into region_lpdf one region by another region - basically this is the body of the for loop which you show.
  4. Write in the model block a for loop where you loop 1:R and call the partial_sum function partial_sum(r, r, …);
  5. Once that works, then call in the model things without a loop, but like

target += partial_sum(1, R/2, …);
target += partial_sum(R/2+1, R, …);

and only then plug in reduce_sum to drive it all.

Please go in small steps. Only make progress once you have a firm and correctly working model. Always check that the model is still the same model (by having some data and a parameter draw for which you compute “lp__” as you can do easily in rstan).

Ah okay, I think I understand now. To confirm - by defining partial_sum to split the data based on regions, reduce_sum will likewise do so? I had confused myself into thinking that reduce_sum added additional splits on top of partial_sum. Is it correct (in a simplistic sense) to say that reduce_sum just parallelizes the splits created by partial_sum ?

reduce_sum will split the data according to what makes sense given the problem size and the resources. The partial_sum function must be able to compute the likelihood contribution from an arbitrary large sub-slice. With multiple cores available, the independent partial sums over the different slices are calculated in parallel, of course.

1 Like

Great, thank you - it finally makes sense to me! I’ll have to rethink the data structure a bit to make that work, but very doable. Thanks so much for the help!