SUR and Hierarchical Model

Hi,
I’m trying to create a varying-slope, varying-intercept hierarchical model with twenty different groups (events in an event study where each event has 199 observations) but relax the assumption that each event is independent. Relaxing this assumption involves, I think (?), stacking the vectors and specifying a covariance matrix across events similar to a SUR model.

I’d appreciate any comments or thoughts on whether the specification below is correct as I’m sceptical of my own stan ability and each 1000 transitions take ~1000 seconds to complete suggesting I’ve violated the Folk theorem. Attached is the same code as below.

Thanks for anything you have to offer,
Ed
SUR_Hierarchical.stan (1.2 KB)

data {
  int<lower=1> N; // Number of observations 
  int<lower=0> L; // Number of events and also number of SUR equations
  vector[L] Y[N]; // Y variable, replicated L = 20 times
  real X[N]; // predictor variable
  int<lower=1, upper=L> id[N]; // id describing which event X and Y correspond to
 
}
parameters{
  vector[L] a[L]; // Intercept term
  vector[L] b[L]; // coefficient on X
  
  vector[L] mu_a; // hierarchical priors for the intercept
  vector<lower=0>[L] sigma_a;
  
  vector[L] mu_b; // hierarchical priors for the predictor
  vector<lower=0>[L] sigma_b;
  
  cholesky_factor_corr[L] L_Omega; // Covariance matrix for the events
  vector<lower=0>[L] L_sigma;
  
}
model {
  vector[L] y_hat[N];
  matrix[L, L] L_Sigma;
  
  mu_a ~ normal(0,5);
  mu_b ~ normal(0,5);
  
  sigma_a ~ normal(0,5);
  sigma_b ~ normal(0,5);
  
  for (l in 1:L){
  a[l] ~ normal(mu_a, sigma_a);
  b[l] ~ normal(mu_b, sigma_b);
  }
   
  
  for (i in 1:N){
    y_hat[i] = a[id[i]] + b[id[i]] * X[i];
  }
  
  L_Sigma = diag_pre_multiply(L_sigma, L_Omega);
  L_Omega ~ lkj_corr_cholesky(4);
  L_sigma ~ cauchy(0, 2.5);
  
  Y ~ multi_normal_cholesky(y_hat, L_Sigma);
}

[edit: escaped code]

Hi! I’ll just share a few ideas, maybe they are helpful…

Did you try a non-centered parameterization for a and b?

Also, you might want to think about explicitly modeling the correlation between a and b, since slopes and intercepts should be correlated.

With regards to the sigma for the events: Would it be appropiate in your case to just model it somethink like y ~ normal(y_hat, sigma[index_L]) and then just put a hierarchical prior on sigma?

Hi Max!
I’ll definitely look into non-centred parameterisation and explicitly modelling a and b - I’m not sure a hierarchical prior on sigma would fix my problem but I’ll give it some more thought.
Thanks a ton,
Ed

As @Max_Mantei suggests, the hierarchical prior is almost always better—the only case where it’s not is where the posterior’s very well identified.

Modeling the correlations can help, but it’s expensive. @Max_Mantei’s suggestion to just use a normal avoids modeling the correlations.

You can just declare-define L_Sigma up top. The Cauchy prior on L_sigma is not very constraining. This can cause problems if there’s not much data. Tightening it to normal(0, 5) or something if appropriate would be faster.

1 Like

Thanks @Bob_Carpenter and @Max_Mantei,

I’ve decided to go with the hierarchical prior on sigma - very grateful for both your comments.