# SUR and Hierarchical Model

Hi,
I’m trying to create a varying-slope, varying-intercept hierarchical model with twenty different groups (events in an event study where each event has 199 observations) but relax the assumption that each event is independent. Relaxing this assumption involves, I think (?), stacking the vectors and specifying a covariance matrix across events similar to a SUR model.

I’d appreciate any comments or thoughts on whether the specification below is correct as I’m sceptical of my own stan ability and each 1000 transitions take ~1000 seconds to complete suggesting I’ve violated the Folk theorem. Attached is the same code as below.

Thanks for anything you have to offer,
Ed
SUR_Hierarchical.stan (1.2 KB)

``````data {
int<lower=1> N; // Number of observations
int<lower=0> L; // Number of events and also number of SUR equations
vector[L] Y[N]; // Y variable, replicated L = 20 times
real X[N]; // predictor variable
int<lower=1, upper=L> id[N]; // id describing which event X and Y correspond to

}
parameters{
vector[L] a[L]; // Intercept term
vector[L] b[L]; // coefficient on X

vector[L] mu_a; // hierarchical priors for the intercept
vector<lower=0>[L] sigma_a;

vector[L] mu_b; // hierarchical priors for the predictor
vector<lower=0>[L] sigma_b;

cholesky_factor_corr[L] L_Omega; // Covariance matrix for the events
vector<lower=0>[L] L_sigma;

}
model {
vector[L] y_hat[N];
matrix[L, L] L_Sigma;

mu_a ~ normal(0,5);
mu_b ~ normal(0,5);

sigma_a ~ normal(0,5);
sigma_b ~ normal(0,5);

for (l in 1:L){
a[l] ~ normal(mu_a, sigma_a);
b[l] ~ normal(mu_b, sigma_b);
}

for (i in 1:N){
y_hat[i] = a[id[i]] + b[id[i]] * X[i];
}

L_Sigma = diag_pre_multiply(L_sigma, L_Omega);
L_Omega ~ lkj_corr_cholesky(4);
L_sigma ~ cauchy(0, 2.5);

Y ~ multi_normal_cholesky(y_hat, L_Sigma);
}
``````

[edit: escaped code]

Hi! I’ll just share a few ideas, maybe they are helpful…

Did you try a non-centered parameterization for `a` and `b`?

Also, you might want to think about explicitly modeling the correlation between `a` and `b`, since slopes and intercepts should be correlated.

With regards to the sigma for the events: Would it be appropiate in your case to just model it somethink like `y ~ normal(y_hat, sigma[index_L])` and then just put a hierarchical prior on sigma?

Hi Max!
I’ll definitely look into non-centred parameterisation and explicitly modelling a and b - I’m not sure a hierarchical prior on sigma would fix my problem but I’ll give it some more thought.
Thanks a ton,
Ed

As @Max_Mantei suggests, the hierarchical prior is almost always better—the only case where it’s not is where the posterior’s very well identified.

Modeling the correlations can help, but it’s expensive. @Max_Mantei’s suggestion to just use a normal avoids modeling the correlations.

You can just declare-define `L_Sigma` up top. The Cauchy prior on `L_sigma` is not very constraining. This can cause problems if there’s not much data. Tightening it to `normal(0, 5)` or something if appropriate would be faster.

1 Like

Thanks @Bob_Carpenter and @Max_Mantei,

I’ve decided to go with the hierarchical prior on sigma - very grateful for both your comments.