# Varying Variances

Hi all,

I’m attempting to fit a model that is similar to the one described here: Separate variances for different levels of a fixed effect

The data comes from a series of studies conducted at 5 different school sites. Individuals (n=675) at each site contribute multiple grade observations. The main experimental variables are treatment condition (2 levels) and ethnicity (2 levels). In rstanarm, the model looks like this:

y~Condition*Group + cov + (1|ID) + (Condition*Group + cov | site)


Where cov is a pre-study covariate measurement.

The relatively simple version of what I would like is to have the varying intercepts subject to different variance distributions.

 \alpha_i \sim N(0, \sigma_j)


(as an aside, is it possible to have math rendered here?)

Where there’s one sigma_j for each combination of Condition and group. If I can get this working and providing sensible answers, I’d ideally like to extend it to each combination of condition, group, and site.

As a first pass, I fit three models:

Model 1: as specified above, in rstanarm. That is, random intercepts for person, with everything else allowed to vary by site. There are no varying variances.
Model 2: Attempt to introduce varying variances for each combination of condition and group
Model 3: Attempt to introduce varying variances for each combination of condition, group, and site.

Below is a plot of the posterior means and 95% uncertainty intervals:

As you can see, something crazy is going on with models 2 (varying_variances1) and 3 (varying_variances2).

My stan code for the latter two models is here:

// generated with brms 1.10.2

data {
int<lower=1> N;  // total number of observations
vector[N] Y;  // response variable
int<lower=1> K;  // number of population-level effects
matrix[N, K] X;  // population-level design matrix
// data for group-level effects of ID 1
int<lower=1> J_1[N]; // person indicator
int<lower=1> N_1; // number of people = 675
int<lower=1> M_1;  // how many reff intercept groups? 4 (for just 4 cells) or 20 (for 4 cells * 5 sites)
int<lower=1> j_1[N_1]; // indicator for reff intercept groups
vector[N] Z_1_1;
// data for group-level effects of ID 2
int<lower=1> J_2[N];
int<lower=1> N_2;
int<lower=1> M_2;
vector[N] Z_2_1;
vector[N] Z_2_2;
vector[N] Z_2_3;
vector[N] Z_2_4;
vector[N] Z_2_5;
int<lower=1> NC_2;
int prior_only;  // should the likelihood be ignored?
}

transformed data {
int Kc = K - 1;
matrix[N, K - 1] Xc;  // centered version of X
vector[K - 1] means_X;  // column means of X before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}

parameters {
vector[Kc] b;  // population-level effects
real temp_Intercept;  // temporary intercept
real<lower=0> sigma;  // residual SD
vector<lower=0>[M_1] sd_1;  // group-level standard deviations
vector[N_1] z_1;  // unscaled group-level effects
vector<lower=0>[M_2] sd_2;  // group-level standard deviations
matrix[M_2, N_2] z_2;  // unscaled group-level effects
// cholesky factor of correlation matrix
cholesky_factor_corr[M_2] L_2;
}

transformed parameters {
vector[N_1] r_1_1;
// group-level effects
matrix[N_2, M_2] r_2 = (diag_pre_multiply(sd_2, L_2) * z_2)'; //'
vector[N_2] r_2_1 = r_2[, 1];
vector[N_2] r_2_2 = r_2[, 2];
vector[N_2] r_2_3 = r_2[, 3];
vector[N_2] r_2_4 = r_2[, 4];
vector[N_2] r_2_5 = r_2[, 5];
// group-level effects
for (n in 1:N_1) {
r_1_1[n] = sd_1[j_1[n]] * (z_1[n]);
}
}

model {
vector[N] mu = Xc * b + temp_Intercept;
for (n in 1:N) {
mu[n] = mu[n] + (r_1_1[J_1[n]]) * Z_1_1[n] + (r_2_1[J_2[n]]) * Z_2_1[n] + (r_2_2[J_2[n]]) * Z_2_2[n] + (r_2_3[J_2[n]]) * Z_2_3[n] + (r_2_4[J_2[n]]) * Z_2_4[n] + (r_2_5[J_2[n]]) * Z_2_5[n];
}
// priors including all constants
target += student_t_lpdf(sigma | 3, 0, 10)
- 1 * student_t_lccdf(0 | 3, 0, 10);
target += student_t_lpdf(sd_1 | 3, 0, 10)
- 1 * student_t_lccdf(0 | 3, 0, 10);
target += normal_lpdf(z_1 | 0, 1);
target += student_t_lpdf(sd_2 | 3, 0, 10)
- 5 * student_t_lccdf(0 | 3, 0, 10);
target += lkj_corr_cholesky_lpdf(L_2 | 1);
target += normal_lpdf(to_vector(z_2) | 0, 1);
// likelihood including all constants
if (!prior_only) {
target += normal_lpdf(Y | mu, sigma);
}
}

}


I generated a base model identical to model 1 using brms, and then adjusted it slightly to get the model I want. I know the usual advice is to start small and build up, but this seemed like a small enough tweak that I could get away with cheating a bit.

What am I missing? Why am I not getting sensible results?

1 Like

Is that the Stan code for model 2 or 1?

For model 2. The two differences between model 1 and model 2 are 1) I added the j_1 indicator in the data block, and 2) the definitions of the r_1_1s looks like this:

  for (n in 1:N_1) {
r_1_1[n] = sd_1[j_1[n]] * (z_1[n]);
}


rather than this:

  vector[N_1] r_1_1 = sd_1[1] * (z_1[1]);


below is the original brms model (equivalent to model 1)

// generated with brms 1.10.2
functions {
}
data {
int<lower=1> N;  // total number of observations
vector[N] Y;  // response variable
int<lower=1> K;  // number of population-level effects
matrix[N, K] X;  // population-level design matrix
// data for group-level effects of ID 1
int<lower=1> J_1[N];
int<lower=1> N_1;
int<lower=1> M_1;
vector[N] Z_1_1;
// data for group-level effects of ID 2
int<lower=1> J_2[N];
int<lower=1> N_2;
int<lower=1> M_2;
vector[N] Z_2_1;
vector[N] Z_2_2;
vector[N] Z_2_3;
vector[N] Z_2_4;
vector[N] Z_2_5;
int<lower=1> NC_2;
int prior_only;  // should the likelihood be ignored?
}
transformed data {
int Kc = K - 1;
matrix[N, K - 1] Xc;  // centered version of X
vector[K - 1] means_X;  // column means of X before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
vector[Kc] b;  // population-level effects
real temp_Intercept;  // temporary intercept
real<lower=0> sigma;  // residual SD
vector<lower=0>[M_1] sd_1;  // group-level standard deviations
vector[N_1] z_1[M_1];  // unscaled group-level effects
vector<lower=0>[M_2] sd_2;  // group-level standard deviations
matrix[M_2, N_2] z_2;  // unscaled group-level effects
// cholesky factor of correlation matrix
cholesky_factor_corr[M_2] L_2;
}
transformed parameters {
// group-level effects
vector[N_1] r_1_1 = sd_1[1] * (z_1[1]);
// group-level effects
matrix[N_2, M_2] r_2 = (diag_pre_multiply(sd_2, L_2) * z_2)';
vector[N_2] r_2_1 = r_2[, 1];
vector[N_2] r_2_2 = r_2[, 2];
vector[N_2] r_2_3 = r_2[, 3];
vector[N_2] r_2_4 = r_2[, 4];
vector[N_2] r_2_5 = r_2[, 5];
}
model {
vector[N] mu = Xc * b + temp_Intercept;
for (n in 1:N) {
mu[n] = mu[n] + (r_1_1[J_1[n]]) * Z_1_1[n] + (r_2_1[J_2[n]]) * Z_2_1[n] + (r_2_2[J_2[n]]) * Z_2_2[n] + (r_2_3[J_2[n]]) * Z_2_3[n] + (r_2_4[J_2[n]]) * Z_2_4[n] + (r_2_5[J_2[n]]) * Z_2_5[n];
}
// priors including all constants
target += student_t_lpdf(sigma | 3, 0, 10)
- 1 * student_t_lccdf(0 | 3, 0, 10);
target += student_t_lpdf(sd_1 | 3, 0, 10)
- 1 * student_t_lccdf(0 | 3, 0, 10);
target += normal_lpdf(z_1[1] | 0, 1);
target += student_t_lpdf(sd_2 | 3, 0, 10)
- 5 * student_t_lccdf(0 | 3, 0, 10);
target += lkj_corr_cholesky_lpdf(L_2 | 1);
target += normal_lpdf(to_vector(z_2) | 0, 1);
// likelihood including all constants
if (!prior_only) {
target += normal_lpdf(Y | mu, sigma);
}
}
generated quantities {
// actual population-level intercept
real b_Intercept = temp_Intercept - dot_product(means_X, b);
corr_matrix[M_2] Cor_2 = multiply_lower_tri_self_transpose(L_2);
vector<lower=-1,upper=1>[NC_2] cor_2;
// take only relevant parts of correlation matrix
cor_2[1] = Cor_2[1,2];
cor_2[2] = Cor_2[1,3];
cor_2[3] = Cor_2[2,3];
cor_2[4] = Cor_2[1,4];
cor_2[5] = Cor_2[2,4];
cor_2[6] = Cor_2[3,4];
cor_2[7] = Cor_2[1,5];
cor_2[8] = Cor_2[2,5];
cor_2[9] = Cor_2[3,5];
cor_2[10] = Cor_2[4,5];
}


I’ve narrowed down the problem here. I think there’s something different between the stan model and the brms model, even without my adjustments. For instance:

p <- c(set_prior('normal(2,2)', class='Intercept', coef=''),
set_prior('normal(0,1)', class='b'))

m.brm <- brm(gpa ~ Condition*Ethnicity + (1|ID) +
(Condition*Ethnicity|site), data=testing_data, prior=p)
m.rstanarm <- stan_glmer(gpa ~ Condition*Ethnicity + (1|ID) +
(Condition*Ethnicity|site),
prior_intercept=normal(2,2), prior=normal(0,1),
data=testing_data)


These models look identical to me, but result in wildly different precision of the non-varying estimates:

I started looking at some of the individual parameters to see if there were any that were totally different between the two models. It looks like the covariance parameters governing the site-level variability is a prime suspect:

What am I doing incorrectly here? My guess would be the LKJ prior, but both packages say that zeta is set to 1 by default, and some mild adjustments to the brms model have not changed the patterns here.

2 Likes

I’m not sure how much @paul.buerkner (author of brms) monitors this list, so you may want to take it up on the brms list: