I’m fitting a hierarchical vector auto-regression, with units nested within groups (C). Currently, the model takes about 6 hours to get through 1500 samples with only 1 lag, and I’d like to incorporate additional lags in the future.
data{
int N; //# observations
int K; //# dimensions of Y
int C; //# of countries
int T; //# of time periods in the panel
int<lower = 1, upper=C> country[N]; //country id for each obs
int<lower = 1, upper=T> time[N]; //time period id for each obs
matrix[N,K] Y; //the outcome matrix - each variable's time series stacked
}
parameters{
//individual level
corr_matrix[K] Omega_local[C]; //cholesky factor - correlation of errors btwn regions
vector<lower = 0>[K] tau[C]; //scale for residuals
matrix[K, K] z_beta[C]; //untransformed betas
vector[K] z_alpha[C]; //untransformed intercepts
//hierarchical priors
real<lower = 0, upper = 1> rho; //pooling coefficient
corr_matrix[K] Omega_global;
vector[K] tau_loc; //mean for variance scaling factor
vector<lower=0>[K] tau_scale; //scale for tau
matrix[K, K] bhat_location; //mean for prior on beta
matrix<lower = 0>[K, K] bhat_scale; //scale for prior on beta
vector[K] ahat_location; //means for prior on intercepts
vector<lower = 0>[K] ahat_scale; //variance for intercept prior
}
transformed parameters{
matrix[K, K] beta[C]; //VAR(1) coefficients, country specific
vector[K] alpha[C]; //country specific intercepts
corr_matrix[K] Omega[C];
for(c in 1:C){
//recentering random effects
alpha[c] = ahat_location + ahat_scale .*z_alpha[c];
beta[c] = bhat_location + bhat_scale*z_beta[c];
Omega[c] = rho*Omega_global + (1-rho)*Omega_local[c];
}
}
model{
//hyperpriors
rho ~ beta(2,2);
tau_loc ~ cauchy(0,1);
tau_scale ~ cauchy(0,1);
ahat_location ~ normal(0,1);
ahat_scale ~ cauchy(0, 1);
to_vector(bhat_location) ~ normal(0, 0.5);
to_vector(bhat_scale) ~ cauchy(0, 0.5);
Omega_global ~ lkj_corr(1);
//hierarchical priors
for(c in 1:C){
//non-centered paramaterization to avoid convergence issues
z_alpha[c] ~ normal(0, 1);
to_vector(z_beta[c]) ~ normal(0, 1);
tau[c] ~ normal(tau_loc, tau_scale);
Omega_local[c] ~ lkj_corr(10);
}
//likelihood
for(n in 1:N){
if(time[n] > 1){
Y[n] ~ multi_normal(alpha[country[n]] + beta[country[n]]*Y[n-1]',
quad_form_diag(Omega[country[n]], tau[country[n]]));
}
}
}
My initial guess is that the the loop in the likelihood is probably what’s slowing things down, but I’m not sure if that actually can be replaced with anything more efficient.
Any suggestions on performance improvements are much appreciated!
First you should check out the optimizations in the Multivariate Priors for Hierarchical Models section of the Stan User’s guide, specifically the optimizations that are presented later in the chapter. Then you should take a look at the reduced-computation-through-recognizing-redundancy trick I present here.
Oh, and I’m not super familiar with VAR models, but you seem to have T observations in each country, once per timepoint, but you don’t seem to use time as a predictor at all, which suggests to me that you can transition your model to be more similar to the one presented in the manual chapter where you end up with a likelihood that is of the form y ~ normal(something,measurement_noise). Framing it that way might help by itself, but you’d then also be prepared to use the new reduce_sum function coming in Stan 2.23
Thanks! The Cholesky factorization trick seems likely to work, though there’s no way to avoid calculating the variance for each observation since it depends on the observation’s country (unless I’m misunderstanding something in the example). I’ll give it a go.
After looking over the manual page again, I thought I figured out a way to avoid calculating the country-specific variance n times, but I’m getting variable corr_matrix does not exist. Am I misunderstanding how to define the local scope?
Whoops - sorry for all the notifications, just realized I need to define the variance as a transformed parameter. Does in fact save a decent bit of time!
If you’re following @James_Savage’s implementation at RPubs - An introduction to hierarchical VAR modeling. I’ve coded up a version that is about 30% faster (if I remember correctly). It pre-allocates mu and uses segment to vectorize calls to multi_normal_cholesky. Hope this helps!
data {
int N; // number of observations (number of rows in the panel)
int K; // number of dimensions of the outcome variable
int I; // number of individuals
int T; // The greatest number of time periods for any individual
int<lower = 1, upper = I> individual[N]; // integer vector the same length
// as the panel, indicating which individual each row corresponds to
int<lower = 1, upper = T> time[N]; // integer vector with individual-time
// (not absolute time). Each individual starts at 1
matrix[N, K] Y; // the outcome matrix, each individual stacked on each
vector[K] Y_[N - I];
int grp_size[I];
// other, observations ordered earliest to latest
}
parameters {
// individual-level parameters
cholesky_factor_corr[K] L_Omega_local[I]; // cholesky factor of correlation matrix of
// residuals each individual (across K outcomes)
vector<lower = 0>[K] tau[I]; // scale vector for residuals
matrix[K, K] z_beta[I];
vector[K] z_alpha[I];
// hierarchical priors (individual parameters distributed around these)
real<lower = 0, upper = 1> rho;
cholesky_factor_corr[K] L_Omega_global;
vector[K] tau_location;
vector<lower =0>[K] tau_scale;
matrix[K,K] beta_hat_location;
matrix<lower = 0>[K,K] beta_hat_scale;
vector[K] alpha_hat_location;
vector<lower = 0>[K] alpha_hat_scale;
}
model {
// hyperpriors
rho ~ beta(2, 2);
tau_location ~ std_normal();
tau_scale ~ exponential(1);
alpha_hat_location ~ std_normal();
alpha_hat_scale ~ exponential(1);
to_vector(beta_hat_location) ~ normal(0, .5);
to_vector(beta_hat_scale) ~ exponential(1);
L_Omega_global ~ lkj_corr_cholesky(1);
{
int pos_ = 1;
int pos = 1;
vector[K] mu[N - I];
matrix[K, K] beta[I]; // VAR(1) coefficient matrix
vector[K] alpha[I]; // intercept matrix
matrix[K, K] L_Omega[I];
matrix[K, K] Omega_global = multiply_lower_tri_self_transpose(L_Omega_global);
for (i in 1:I) {
alpha[i] = alpha_hat_location + alpha_hat_scale .* z_alpha[i];
beta[i] = beta_hat_location + beta_hat_scale .*z_beta[i];
L_Omega[i] = cholesky_decompose(rho * Omega_global + (1 - rho) * multiply_lower_tri_self_transpose(L_Omega_local[i]));
}
// hierarchical priors
for (n in 1:N) {
if (time[n] > 1) {
mu[pos] = alpha[individual[n]] + beta[individual[n]] * Y[n - 1]';
pos += 1;
}
}
for(i in 1:I) {
matrix[K, K] varcov = diag_pre_multiply(tau[i], L_Omega[i]);
// non-centered parameterization
z_alpha[i] ~ std_normal();
to_vector(z_beta[i]) ~ std_normal();
tau[i] ~ normal(tau_location, tau_scale);
L_Omega_local[i] ~ lkj_corr_cholesky(5);
segment(Y_, pos_, grp_size[i]) ~ multi_normal_cholesky(segment(mu, pos_, grp_size[i]), varcov);
pos_ += grp_size[i];
}
}
}
generated quantities {
matrix[K, K] Omega_global = multiply_lower_tri_self_transpose(L_Omega_global);
}
Thank you for your response. I’m not very familiar with your variant implementation when it come to Y_[N-I] and segment function but your code is a great starting point!