Low bulk ess simple random intercept model

So this is something I’ve already seen a few times, but currently have no super good insights about. The main issue is IMHO a negative correlation in the posterior between the main intercept and the random intercept.

pairs(bm$fit, pars = c("b_Intercept", "r_ctryear[2020.1950,Intercept]", "r_ctryear[2020.1990,Intercept]"))

This is probably related to the fact that you have a lot of data per group, which can overwhelm the hierarchical prior on the varying intercepts and make the model behave almost as if you had a dummy-coded fixed intercept for each level of ctryear (while with fixed effects you would usually not have a fixed intercept for the reference level)

What to do about this?

One think is just to fit ctryear as fixed effect: bm_fixed = brm(wy ~ zigdp_pc + ctryear, data = dat) doesn’t actually work that badly. You can even be a bit more clever about this and fit the levels with a lot of observations as fixed intercepts and keep the varying intercept only for those with few observations: you would introduce a new variable ctryear_fixed which has value “none” for those where we want to have a random effect and a specifc level otherwise, similarly ctryear_random then something like wy ~ zigdp_pc + ctryear_fixed + (1 | ctryear_random) could work better.

Mike’s case study on hierarchical modelling discusses that in such cases, you may want to use centered parametrization for some/all of your levels (brms uses the non-centered parametrization). There is an issue to let you have this in brms but it is not implemented yet: Centered and partially centered parametrization of varying effects/intercepts · Issue #1162 · paul-buerkner/brms · GitHub

One can manually edit the generated brms code and use non-centered for all parameters, but that doesn’t help. One could also add a sum-to-zero constrain on the varying intercepts to remove one degree of freedom (I think R-INLA does this by default), this also does not help (at least a simple implementation). However doing both of those things helps, here’s the manually modified Stan code:

// generated with brms 2.15.7
functions {
}
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  int Kc = K - 1;
  matrix[N, Kc] Xc;  // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
}
parameters {
  vector[Kc] b;  // population-level effects
  real Intercept;  // temporary intercept for centered predictors
  real<lower=0> sigma;  // dispersion parameter
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  // Original code by brms
  // vector[N_1] z_1[M_1];  // standardized group-level effects
  
  // Manually added
  vector[N_1] r_1_1;  // actual group-level effects
}

// Original code by brms
// transformed parameters {
//  vector[N_1] r_1_1;  // actual group-level effects
//  r_1_1 = (sd_1[1] * (z_1[1]));
// }

model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] mu = Intercept + rep_vector(0.0, N);
    for (n in 1:N) {
      // add more terms to the linear predictor
      mu[n] += r_1_1[J_1[n]] * Z_1_1[n];
    }
    target += normal_id_glm_lpdf(Y | Xc, mu, b, sigma);
  }
  // priors including constants
  target += student_t_lpdf(Intercept | 3, 0.1, 2.5);
  target += student_t_lpdf(sigma | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
  target += student_t_lpdf(sd_1 | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
    
  // Original code by brms
  // target += std_normal_lpdf(z_1[1]);
  
  // Manually added
  target += normal_lpdf(r_1_1 | 0, sd_1[1]);
  
  // manual soft sum to zero constraint
  target += normal_lpdf(sum(r_1_1) | 0, 0.0001 *N_1);
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);
}

I then use make_standata to build data for the model and sample with cmdstanr directly (would also work with rstan), i.e.:

library(cmdstanr)
m_centered <- cmdstan_model("modified.stan")
sdat <- make_standata(wy ~ zigdp_pc + (1|ctryear), data = dat) 
fit_centered <- m_centered$sample(data = sdat)

bm_manual <- brm(wy ~ zigdp_pc + (1|ctryear), data = dat, empty = TRUE) 
bm_manual$fit <- rstan::read_stan_csv(fit_centered$output_files())
bm_manual <- rename_pars(bm_manual)
summary(bm_manual)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: wy ~ zigdp_pc + (1 | ctryear) 
   Data: dat (Number of observations: 1826) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Group-Level Effects: 
~ctryear (Number of levels: 74) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.39      0.03     0.33     0.46 1.00     6214     2909

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.21      0.00     0.20     0.21 1.00     1651     2057
zigdp_pc      0.29      0.01     0.27     0.31 1.00      530     1027

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.12      0.00     0.12     0.12 1.00     6207     3161

Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Hope that helps at least a bit!

2 Likes