Hi all,
I’m attempting to fit a model that is similar to the one described here: Separate variances for different levels of a fixed effect
The data comes from a series of studies conducted at 5 different school sites. Individuals (n=675) at each site contribute multiple grade observations. The main experimental variables are treatment condition (2 levels) and ethnicity (2 levels). In rstanarm, the model looks like this:
y~Condition*Group + cov + (1|ID) + (Condition*Group + cov | site)
Where cov is a pre-study covariate measurement.
The relatively simple version of what I would like is to have the varying intercepts subject to different variance distributions.
\alpha_i \sim N(0, \sigma_j)
(as an aside, is it possible to have math rendered here?)
Where there’s one sigma_j for each combination of Condition and group. If I can get this working and providing sensible answers, I’d ideally like to extend it to each combination of condition, group, and site.
As a first pass, I fit three models:
Model 1: as specified above, in rstanarm. That is, random intercepts for person, with everything else allowed to vary by site. There are no varying variances.
Model 2: Attempt to introduce varying variances for each combination of condition and group
Model 3: Attempt to introduce varying variances for each combination of condition, group, and site.
Below is a plot of the posterior means and 95% uncertainty intervals:
As you can see, something crazy is going on with models 2 (varying_variances1) and 3 (varying_variances2).
My stan code for the latter two models is here:
// generated with brms 1.10.2
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
// data for group-level effects of ID 1
int<lower=1> J_1[N]; // person indicator
int<lower=1> N_1; // number of people = 675
int<lower=1> M_1; // how many reff intercept groups? 4 (for just 4 cells) or 20 (for 4 cells * 5 sites)
int<lower=1> j_1[N_1]; // indicator for reff intercept groups
vector[N] Z_1_1;
// data for group-level effects of ID 2
int<lower=1> J_2[N];
int<lower=1> N_2;
int<lower=1> M_2;
vector[N] Z_2_1;
vector[N] Z_2_2;
vector[N] Z_2_3;
vector[N] Z_2_4;
vector[N] Z_2_5;
int<lower=1> NC_2;
int prior_only; // should the likelihood be ignored?
}
transformed data {
int Kc = K - 1;
matrix[N, K - 1] Xc; // centered version of X
vector[K - 1] means_X; // column means of X before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
vector[Kc] b; // population-level effects
real temp_Intercept; // temporary intercept
real<lower=0> sigma; // residual SD
vector<lower=0>[M_1] sd_1; // group-level standard deviations
vector[N_1] z_1; // unscaled group-level effects
vector<lower=0>[M_2] sd_2; // group-level standard deviations
matrix[M_2, N_2] z_2; // unscaled group-level effects
// cholesky factor of correlation matrix
cholesky_factor_corr[M_2] L_2;
}
transformed parameters {
vector[N_1] r_1_1;
// group-level effects
matrix[N_2, M_2] r_2 = (diag_pre_multiply(sd_2, L_2) * z_2)'; //'
vector[N_2] r_2_1 = r_2[, 1];
vector[N_2] r_2_2 = r_2[, 2];
vector[N_2] r_2_3 = r_2[, 3];
vector[N_2] r_2_4 = r_2[, 4];
vector[N_2] r_2_5 = r_2[, 5];
// group-level effects
for (n in 1:N_1) {
r_1_1[n] = sd_1[j_1[n]] * (z_1[n]);
}
}
model {
vector[N] mu = Xc * b + temp_Intercept;
for (n in 1:N) {
mu[n] = mu[n] + (r_1_1[J_1[n]]) * Z_1_1[n] + (r_2_1[J_2[n]]) * Z_2_1[n] + (r_2_2[J_2[n]]) * Z_2_2[n] + (r_2_3[J_2[n]]) * Z_2_3[n] + (r_2_4[J_2[n]]) * Z_2_4[n] + (r_2_5[J_2[n]]) * Z_2_5[n];
}
// priors including all constants
target += student_t_lpdf(sigma | 3, 0, 10)
- 1 * student_t_lccdf(0 | 3, 0, 10);
target += student_t_lpdf(sd_1 | 3, 0, 10)
- 1 * student_t_lccdf(0 | 3, 0, 10);
target += normal_lpdf(z_1 | 0, 1);
target += student_t_lpdf(sd_2 | 3, 0, 10)
- 5 * student_t_lccdf(0 | 3, 0, 10);
target += lkj_corr_cholesky_lpdf(L_2 | 1);
target += normal_lpdf(to_vector(z_2) | 0, 1);
// likelihood including all constants
if (!prior_only) {
target += normal_lpdf(Y | mu, sigma);
}
}
}
I generated a base model identical to model 1 using brms, and then adjusted it slightly to get the model I want. I know the usual advice is to start small and build up, but this seemed like a small enough tweak that I could get away with cheating a bit.
What am I missing? Why am I not getting sensible results?