Hello, I am developing a Hierarchical Model for Causal Inference in order to reproduce a famous paper in the Economics literature. I vectorized my model but I do not understand why all iterations either diverge or exceed the max tree depth. I was wondering if anyone could tell me what I am doing wrong.
This is the model I want to run:
\begin{align*}
\begin{cases}
& y_{n,j} \ | \ \beta_j, \ (\sigma_j)_{j=1}^J \ , \ \rho \ \sim \ \mathcal{N} \left( x_n \beta_j + w_n\theta \ , \ w_n\sigma_t + (1-w_n)\sigma_c \right) \ \ \ n = 1,...,N \ ; \ j=1,...,J \\
& \beta_j\ | \ \gamma_l \ (\sigma_j)_{j=1}^J \ , \ \rho \ \sim \ \mathcal{N} \left( (u \cdot \gamma)_{j} \ , \ \Sigma \right) \ \ \ j=1,...,J \ ; \ l= 1,...L \\
& \sigma_j \ \sim \ \mathcal{I}nv-\mathcal{G}amma(1, 0.01) \ \ \ j=1,...,J,c,t \\
& \gamma_l \ \sim \ \mathcal{N}(0,5) \\
& \rho \ \sim \ \mathcal{U}(-1,1) \\
& \theta \ \sim \ \mathcal{N}(0,100)
\end{cases}
\end{align*}
And here is how I coded it on Stan:
data {
int<lower=0> N; // num individuals
int<lower=1> K; // num ind predictors
int<lower=1> J; // num groups
int<lower=1> L; // num group predictors
int<lower=1,upper=J> g[N]; // group for individual
matrix[N, K] x; // individual predictors
matrix[J,L] u; // group predictors
vector[N] y; // outcomes
vector[N] w; // treatment assigned
}
parameters {
vector[K] beta[J]; // vector of school-specific coefficients
matrix[L,K] gamma; // school coefficients
real<lower=-1,upper=+1> rho; // school coefficients correlation
real<lower=0> sigma2[K];
real effect; // super-population average treatment effect
real<lower=0> sigma2_t; // residual SD for the treated
real<lower=0> sigma2_c; // residual SD for the control
}
transformed parameters{
matrix[J,K] u_gamma;
matrix[K,K] Sigma;
u_gamma = u * gamma;
for (s in 1:K) {
for (z in 1:K){
if (s==z)
Sigma[s,z] = sigma2[s]^1/2 * sigma2[z]^1/2;
else
Sigma[s,z] = sigma2[s]^1/2 * sigma2[z]^1/2 * rho;
}
}
}
model {
vector[N] m;
vector[N] d;
vector[N] uno;
vector[K] a[J];
// Here we assign priors to the variances and the correlation coefficient
// of the beta_j's
sigma2 ~ inv_gamma(1, 0.01);
rho ~ uniform(-1,1);
// Here we create the vector Beta (JK x 1) by stacking in a column vector all
// the vectors beta_j (K x 1) and its expected value
for (j in 1:J){
a[j] = row(u_gamma,j)';
}
// Finally we assign the prior to Beta
beta ~ multi_normal(a, Sigma);
// Here we model the likelihood of our outcome variable y
effect ~ normal(0, 100);
sigma2_c ~ inv_gamma(1, 0.01);
sigma2_t ~ inv_gamma(1, 0.01);
to_vector(gamma)~ normal(0, 5);
for (n in 1:N) {
m[n] = dot_product(beta[g[n]] , x[n]) + effect*w[n];
}
uno = rep_vector(1, N);
d = sigma2_t*w + sigma2_c*(uno-w);
y ~ normal(m, d);
}
Thanks in advance for all the help.