Questions about parameterization of random intercepts and slopes model with group-level predictor (rstan vs. brms)

I have worked some with rstan, but mostly with rstanarm and brms. However, I am currently working on rstan fluency and I am having a little bit of trouble understanding how to work with the manual’s suggested parameterization of a random intercepts and slopes model.

Specifically, I’m working with the following model:

happy ~ 1 + income_cwc * gm_GNP + (1 + income_cwc | CountryID)

Where:

  • happy is being treated as a normally distributed
  • income_cwc is being treated as normally distributed and has been centered within groups
  • gm_GNP is country level grand-mean-centered gross national product
  • CountryID is an group indicator, where groups are countries (J = 38)

An rdump of the data I’m using can be found here model_1_data.txt (163.5 KB).

Relying on the manual (pgs 37-39) I understand that this model can be parameterized as:

data {
  int N; // Sample size
  int K; // Number of individual level predictors
  int J; // Number of groups
  int L; // Number of group predictors
  int<lower=1, upper=J> jj[N]; // Column vector of length N containing group (i.e country) IDs.
  matrix[N, K] x; // N x K matrix containing individual-level predictors
  row_vector[L] u[J]; // A length J array of row vectors with length L containing group predictors
  vector[N] y; // outcome
}
parameters {
  corr_matrix[K] Omega; // prior for correlation between random int and slope
  vector<lower=0>[K] tau; // Vector of scale parameters for random ints and slopes
  matrix[L, K] gamma; // L x K matrix containing coefficients for group predictors
  vector[K] beta[J]; // individual coefficients by group
  real<lower=0> sigma; // prediction error scale
}
model {
  tau ~ cauchy(0, 2.5);
  Omega ~ lkj_corr(2);
  to_vector(gamma) ~ normal(0, 5);
  
  {
    row_vector[K] u_gamma[J];
    for (j in 1:J)
      u_gamma[j] = u[j] * gamma;
    beta ~ multi_normal(u_gamma, quad_form_diag(Omega, tau));
  }
  
  {
    vector[N] x_beta_jj;
    for (n in 1:N)
      x_beta_jj[n] = x[n]*beta[jj[n]];
    y ~ normal(x_beta_jj, sigma);
  }
  
}

And, I am comparing this model’s estimates to estimates obtained from a brms model specified using the formula above and estimated from this data brms_data.csv (653.7 KB)

I’ve included the stan code for the brms model at the end of this post. And an r script estimating both the stan and brms models I discuss below is available here example.R (1.7 KB).

Questions:

The rstan model converges fine and I am able to obtain point estimates for income_cwc and the effects of gm_GNP on the intercepts and slopes. To do this, I just take the mean of the posterior distribution for the corresponding parameters. These point estimates correspond to those obtained via brms.

However, I have the following questions:

How should I estimate the SD of the intercepts and slopes?
Initially, I thought that tau contained these SDs; however, the estimates in tau are much larger than those obtained from brms.

I thought that it also was viable to just take the SD of the posterior distribution of the betas. And indeed, when I do that, I get estimates that are comparable to the random effects estimated by the brms model. However, I still confused by tau and not totally confident in my understanding of what’s going on.

How should I estimate the posterior SD of the fixed effects?
If it is the case that the posterior SDs of the betas from the model represent that SD of the random intercepts and slopes, it is not quite clear to me how to obtain the posterior SD for the fixed effect.

Further, I assumed that the posterior distribution of the gammas would give me the posterior SD for the fixed effects of the group-level predictor and cross-level interaction. As noted above, the mean of the gamma’s posterior distributions returns point estimates comparable to those estimated by the brms model. However, the SD of their posterior distributions is much larger than the posterior SDs reported by brms. Why?

What’s going on with the estimated correlation between the intercepts and slopes?
If I understand the parameterization right (and I probably don’t), the off-diagonals of Omega should contain the correlation between the slopes and intercepts. However, the stan model estimate for this parameter is .74, whereas the brms estimate is approximately -.09.

Last thoughts

I know that these models are different and, particularly given that they use different priors, I wouldn’t expect identical estimates. And, there are a decent number of observations (~5000), so I wouldn’t have anticipated a huge degree of prior sensitivity.

Accordingly, I don’t know whether the discrepancies I am observing are caused by me not understanding how to extract the right estimates from the stan model or underestimating the influence of the specified.

I know this is a rather rudimentary question and I hope that I’m not wasting anyone’s time. I did look around the message board a fair amount and still found myself confused.

Thanks in advance for any insight!

brms model

data { 
  int<lower=1> N;  // total number of observations 
  vector[N] Y;  // response variable 
  int<lower=1> K;  // number of population-level effects 
  matrix[N, K] X;  // population-level design matrix 
  // data for group-level effects of ID 1
  int<lower=1> J_1[N]; // Group IDs
  int<lower=1> N_1; // Number of units in Group Var 1
  int<lower=1> M_1; // Number of varying pars for Group Var 1
  vector[N] Z_1_1; // Column vector of 1s (intercept)
  vector[N] Z_1_2; // Column vector containg income_cwc
  int<lower=1> NC_1; // Set to 1
} 
transformed data { 
  int Kc = K - 1; 
  matrix[N, K - 1] Xc;  // centered version of X 
  vector[K - 1] means_X;  // column means of X before centering 
  for (i in 2:K) { 
    means_X[i - 1] = mean(X[, i]); 
    Xc[, i - 1] = X[, i] - means_X[i - 1]; 
  } 
} 
parameters { 
  vector[Kc] b;  // population-level effects 
  real temp_Intercept;  // temporary intercept 
  real<lower=0> sigma;  // residual SD 
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  matrix[M_1, N_1] z_1;  // unscaled group-level effects
  // cholesky factor of correlation matrix
  cholesky_factor_corr[M_1] L_1;
} 
transformed parameters { 
  // group-level effects 
  matrix[N_1, M_1] r_1 = (diag_pre_multiply(sd_1, L_1) * z_1)'; // Random intercepts and slopes
  vector[N_1] r_1_1 = r_1[, 1]; // Random intercepts
  vector[N_1] r_1_2 = r_1[, 2]; // Random slopes
} 
model { 
  vector[N] mu = temp_Intercept + Xc * b; // Initial prediction from linear predictor?
  for (n in 1:N) { 
    mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_1_2[J_1[n]] * Z_1_2[n];
  } // Adjust for varying intercepts and slopes?
  
  // priors including all constants 
  target += student_t_lpdf(temp_Intercept | 3, 3, 10); 
  target += student_t_lpdf(sigma | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10); 
  target += student_t_lpdf(sd_1 | 3, 0, 10)
    - 2 * student_t_lccdf(0 | 3, 0, 10); 
  target += normal_lpdf(to_vector(z_1) | 0, 1);
  target += lkj_corr_cholesky_lpdf(L_1 | 1); 
  // likelihood including all constants 
  if (!prior_only) { 
    target += normal_lpdf(Y | mu, sigma);
  } 
} 
generated quantities { 
  // actual population-level intercept 
  real b_Intercept = temp_Intercept - dot_product(means_X, b); 
  corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
  vector<lower=-1,upper=1>[NC_1] cor_1;
  // take only relevant parts of correlation matrix
  cor_1[1] = Cor_1[1,2]; 
}

Code Used for obtaining means and SDs of the intercepts and slopes:

beta_samp = rstan::extract(random_intercepts_fit, pars='beta')

beta_samp$beta[,,1] %>%
  as.data.frame() %>%
  gather() %>%
  pull(value) %>%
  mean()

beta_samp$beta[,,2] %>%
  as.data.frame() %>%
  gather() %>%
  pull(value) %>%
  mean()

Okay, I found it on page 149ff in version 2.17.1 of the manual. I barely muddle my way through Stan code, but I don’t think the Stan model is returning the same parameters as the brms model you have. I can’t even tell how tau is estimating the parameters. If someone can explain, I would be glad to know.

I found working through this paper to be enlightening and it has helped me tremendously. Work through it, and if you still have problems, I should be able to help you code it up. For all of the examples I have worked through, the native Stan and brms results have been very similar.

Here is how I have coded up the code, based on the paper I referenced earlier. I am able to recover the parameters (betas) for the fixed effects. Is this what you are looking for?

library(rstan)
library(brms)
library(lme4)

rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

model3 = 'data {
int<lower=0> N;               //no trials
int<lower=1> P;               //no fixefs
int<lower=0> J;               //no subjects
int<lower=1> n_u;             //no subj ranefs
int<lower=1,upper=J> subj[N]; //subject indicator
row_vector[P] X[N];           //fixef design matrix
row_vector[n_u] Z_u[N];       //subj ranef design matrix
vector[N] rt;                 //reading time
}

parameters {
vector[P] beta;               //fixef coefs
cholesky_factor_corr[n_u] L_u;  //cholesky factor of subj ranef corr matrix
vector<lower=0>[n_u] sigma_u; //subj ranef std
real<lower=0> sigma_e;        //residual std
vector[n_u] z_u[J];           //spherical subj ranef

}

transformed parameters {
vector[n_u] u[J];             //subj ranefs

{
  matrix[n_u,n_u] Sigma_u;    //subj ranef cov matrix
  
  Sigma_u = diag_pre_multiply(sigma_u,L_u);
  for(j in 1:J)
  u[j] = Sigma_u * z_u[j];
  
}
}

model {
//priors
L_u ~ lkj_corr_cholesky(2.0);

for (j in 1:J)
z_u[j] ~ normal(0,1);
//likelihood
for (i in 1:N)
rt[i] ~ normal(X[i] * beta + Z_u[i] * u[subj[i]], sigma_e);
}
'

dat <- read.csv('brms_data.csv')

### new data list
Xmat = model.matrix(lmer(happy ~ 1 + income_cwc*gm_GNP + (1 + income_cwc| CountryID.int),data = dat))

stan_data = list(N = nrow(Xmat),P = ncol(Xmat),J = length(unique(dat$CountryID.int)),n_u = ncol(Xmat)
                 ,subj = dat$CountryID.int,X = Xmat,Z_u = Xmat, rt = dat$happy)

stan_fit <- stan(model_code = model3, data= stan_data,
                 chains=3, iter=2000)

qa.m.brms <- brm(happy ~ 1 + income_cwc*gm_GNP + (1 + income_cwc| CountryID), 
                 data = dat)

# print estimates
print(qa.m.brms)
print(stan_fit,pars=c("beta"))