I have worked some with rstan, but mostly with rstanarm and brms. However, I am currently working on rstan fluency and I am having a little bit of trouble understanding how to work with the manual’s suggested parameterization of a random intercepts and slopes model.
Specifically, I’m working with the following model:
happy ~ 1 + income_cwc * gm_GNP + (1 + income_cwc | CountryID)
Where:
- happy is being treated as a normally distributed
- income_cwc is being treated as normally distributed and has been centered within groups
- gm_GNP is country level grand-mean-centered gross national product
- CountryID is an group indicator, where groups are countries (J = 38)
An rdump of the data I’m using can be found here model_1_data.txt (163.5 KB).
Relying on the manual (pgs 37-39) I understand that this model can be parameterized as:
data {
int N; // Sample size
int K; // Number of individual level predictors
int J; // Number of groups
int L; // Number of group predictors
int<lower=1, upper=J> jj[N]; // Column vector of length N containing group (i.e country) IDs.
matrix[N, K] x; // N x K matrix containing individual-level predictors
row_vector[L] u[J]; // A length J array of row vectors with length L containing group predictors
vector[N] y; // outcome
}
parameters {
corr_matrix[K] Omega; // prior for correlation between random int and slope
vector<lower=0>[K] tau; // Vector of scale parameters for random ints and slopes
matrix[L, K] gamma; // L x K matrix containing coefficients for group predictors
vector[K] beta[J]; // individual coefficients by group
real<lower=0> sigma; // prediction error scale
}
model {
tau ~ cauchy(0, 2.5);
Omega ~ lkj_corr(2);
to_vector(gamma) ~ normal(0, 5);
{
row_vector[K] u_gamma[J];
for (j in 1:J)
u_gamma[j] = u[j] * gamma;
beta ~ multi_normal(u_gamma, quad_form_diag(Omega, tau));
}
{
vector[N] x_beta_jj;
for (n in 1:N)
x_beta_jj[n] = x[n]*beta[jj[n]];
y ~ normal(x_beta_jj, sigma);
}
}
And, I am comparing this model’s estimates to estimates obtained from a brms model specified using the formula above and estimated from this data brms_data.csv (653.7 KB)
I’ve included the stan code for the brms model at the end of this post. And an r script estimating both the stan and brms models I discuss below is available here example.R (1.7 KB).
Questions:
The rstan model converges fine and I am able to obtain point estimates for income_cwc and the effects of gm_GNP on the intercepts and slopes. To do this, I just take the mean of the posterior distribution for the corresponding parameters. These point estimates correspond to those obtained via brms.
However, I have the following questions:
How should I estimate the SD of the intercepts and slopes?
Initially, I thought that tau contained these SDs; however, the estimates in tau are much larger than those obtained from brms.
I thought that it also was viable to just take the SD of the posterior distribution of the betas. And indeed, when I do that, I get estimates that are comparable to the random effects estimated by the brms model. However, I still confused by tau and not totally confident in my understanding of what’s going on.
How should I estimate the posterior SD of the fixed effects?
If it is the case that the posterior SDs of the betas from the model represent that SD of the random intercepts and slopes, it is not quite clear to me how to obtain the posterior SD for the fixed effect.
Further, I assumed that the posterior distribution of the gammas would give me the posterior SD for the fixed effects of the group-level predictor and cross-level interaction. As noted above, the mean of the gamma’s posterior distributions returns point estimates comparable to those estimated by the brms model. However, the SD of their posterior distributions is much larger than the posterior SDs reported by brms. Why?
What’s going on with the estimated correlation between the intercepts and slopes?
If I understand the parameterization right (and I probably don’t), the off-diagonals of Omega should contain the correlation between the slopes and intercepts. However, the stan model estimate for this parameter is .74, whereas the brms estimate is approximately -.09.
Last thoughts
I know that these models are different and, particularly given that they use different priors, I wouldn’t expect identical estimates. And, there are a decent number of observations (~5000), so I wouldn’t have anticipated a huge degree of prior sensitivity.
Accordingly, I don’t know whether the discrepancies I am observing are caused by me not understanding how to extract the right estimates from the stan model or underestimating the influence of the specified.
I know this is a rather rudimentary question and I hope that I’m not wasting anyone’s time. I did look around the message board a fair amount and still found myself confused.
Thanks in advance for any insight!
brms model
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
// data for group-level effects of ID 1
int<lower=1> J_1[N]; // Group IDs
int<lower=1> N_1; // Number of units in Group Var 1
int<lower=1> M_1; // Number of varying pars for Group Var 1
vector[N] Z_1_1; // Column vector of 1s (intercept)
vector[N] Z_1_2; // Column vector containg income_cwc
int<lower=1> NC_1; // Set to 1
}
transformed data {
int Kc = K - 1;
matrix[N, K - 1] Xc; // centered version of X
vector[K - 1] means_X; // column means of X before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
vector[Kc] b; // population-level effects
real temp_Intercept; // temporary intercept
real<lower=0> sigma; // residual SD
vector<lower=0>[M_1] sd_1; // group-level standard deviations
matrix[M_1, N_1] z_1; // unscaled group-level effects
// cholesky factor of correlation matrix
cholesky_factor_corr[M_1] L_1;
}
transformed parameters {
// group-level effects
matrix[N_1, M_1] r_1 = (diag_pre_multiply(sd_1, L_1) * z_1)'; // Random intercepts and slopes
vector[N_1] r_1_1 = r_1[, 1]; // Random intercepts
vector[N_1] r_1_2 = r_1[, 2]; // Random slopes
}
model {
vector[N] mu = temp_Intercept + Xc * b; // Initial prediction from linear predictor?
for (n in 1:N) {
mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_1_2[J_1[n]] * Z_1_2[n];
} // Adjust for varying intercepts and slopes?
// priors including all constants
target += student_t_lpdf(temp_Intercept | 3, 3, 10);
target += student_t_lpdf(sigma | 3, 0, 10)
- 1 * student_t_lccdf(0 | 3, 0, 10);
target += student_t_lpdf(sd_1 | 3, 0, 10)
- 2 * student_t_lccdf(0 | 3, 0, 10);
target += normal_lpdf(to_vector(z_1) | 0, 1);
target += lkj_corr_cholesky_lpdf(L_1 | 1);
// likelihood including all constants
if (!prior_only) {
target += normal_lpdf(Y | mu, sigma);
}
}
generated quantities {
// actual population-level intercept
real b_Intercept = temp_Intercept - dot_product(means_X, b);
corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
vector<lower=-1,upper=1>[NC_1] cor_1;
// take only relevant parts of correlation matrix
cor_1[1] = Cor_1[1,2];
}
Code Used for obtaining means and SDs of the intercepts and slopes:
beta_samp = rstan::extract(random_intercepts_fit, pars='beta')
beta_samp$beta[,,1] %>%
as.data.frame() %>%
gather() %>%
pull(value) %>%
mean()
beta_samp$beta[,,2] %>%
as.data.frame() %>%
gather() %>%
pull(value) %>%
mean()