So this is something I’ve already seen a few times, but currently have no super good insights about. The main issue is IMHO a negative correlation in the posterior between the main intercept and the random intercept.
pairs(bm$fit, pars = c("b_Intercept", "r_ctryear[2020.1950,Intercept]", "r_ctryear[2020.1990,Intercept]"))
This is probably related to the fact that you have a lot of data per group, which can overwhelm the hierarchical prior on the varying intercepts and make the model behave almost as if you had a dummy-coded fixed intercept for each level of ctryear
(while with fixed effects you would usually not have a fixed intercept for the reference level)
What to do about this?
One think is just to fit ctryear
as fixed effect: bm_fixed = brm(wy ~ zigdp_pc + ctryear, data = dat)
doesn’t actually work that badly. You can even be a bit more clever about this and fit the levels with a lot of observations as fixed intercepts and keep the varying intercept only for those with few observations: you would introduce a new variable ctryear_fixed
which has value “none” for those where we want to have a random effect and a specifc level otherwise, similarly ctryear_random
then something like wy ~ zigdp_pc + ctryear_fixed + (1 | ctryear_random)
could work better.
Mike’s case study on hierarchical modelling discusses that in such cases, you may want to use centered parametrization for some/all of your levels (brms uses the non-centered parametrization). There is an issue to let you have this in brms
but it is not implemented yet: Centered and partially centered parametrization of varying effects/intercepts · Issue #1162 · paul-buerkner/brms · GitHub
One can manually edit the generated brms code and use non-centered for all parameters, but that doesn’t help. One could also add a sum-to-zero constrain on the varying intercepts to remove one degree of freedom (I think R-INLA does this by default), this also does not help (at least a simple implementation). However doing both of those things helps, here’s the manually modified Stan code:
// generated with brms 2.15.7
functions {
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
// data for group-level effects of ID 1
int<lower=1> N_1; // number of grouping levels
int<lower=1> M_1; // number of coefficients per level
int<lower=1> J_1[N]; // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_1;
int prior_only; // should the likelihood be ignored?
}
transformed data {
int Kc = K - 1;
matrix[N, Kc] Xc; // centered version of X without an intercept
vector[Kc] means_X; // column means of X before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
vector[Kc] b; // population-level effects
real Intercept; // temporary intercept for centered predictors
real<lower=0> sigma; // dispersion parameter
vector<lower=0>[M_1] sd_1; // group-level standard deviations
// Original code by brms
// vector[N_1] z_1[M_1]; // standardized group-level effects
// Manually added
vector[N_1] r_1_1; // actual group-level effects
}
// Original code by brms
// transformed parameters {
// vector[N_1] r_1_1; // actual group-level effects
// r_1_1 = (sd_1[1] * (z_1[1]));
// }
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = Intercept + rep_vector(0.0, N);
for (n in 1:N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n];
}
target += normal_id_glm_lpdf(Y | Xc, mu, b, sigma);
}
// priors including constants
target += student_t_lpdf(Intercept | 3, 0.1, 2.5);
target += student_t_lpdf(sigma | 3, 0, 2.5)
- 1 * student_t_lccdf(0 | 3, 0, 2.5);
target += student_t_lpdf(sd_1 | 3, 0, 2.5)
- 1 * student_t_lccdf(0 | 3, 0, 2.5);
// Original code by brms
// target += std_normal_lpdf(z_1[1]);
// Manually added
target += normal_lpdf(r_1_1 | 0, sd_1[1]);
// manual soft sum to zero constraint
target += normal_lpdf(sum(r_1_1) | 0, 0.0001 *N_1);
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept - dot_product(means_X, b);
}
I then use make_standata
to build data for the model and sample with cmdstanr
directly (would also work with rstan
), i.e.:
library(cmdstanr)
m_centered <- cmdstan_model("modified.stan")
sdat <- make_standata(wy ~ zigdp_pc + (1|ctryear), data = dat)
fit_centered <- m_centered$sample(data = sdat)
bm_manual <- brm(wy ~ zigdp_pc + (1|ctryear), data = dat, empty = TRUE)
bm_manual$fit <- rstan::read_stan_csv(fit_centered$output_files())
bm_manual <- rename_pars(bm_manual)
summary(bm_manual)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: wy ~ zigdp_pc + (1 | ctryear)
Data: dat (Number of observations: 1826)
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 4000
Group-Level Effects:
~ctryear (Number of levels: 74)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.39 0.03 0.33 0.46 1.00 6214 2909
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.21 0.00 0.20 0.21 1.00 1651 2057
zigdp_pc 0.29 0.01 0.27 0.31 1.00 530 1027
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.12 0.00 0.12 0.12 1.00 6207 3161
Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Hope that helps at least a bit!