Calculating the pointwise log-likelihood in a two-staged multilevel model

In order to use loo, I want to calculate the pointwise log-likelihood in a two-staged multilevel model; however, I am not sure about the correct way to do it.

Let’s take the example from Overcoming the Simpson's paradox in a multivariate linear model with within and between site gradients and repeated measurements. The data contains within- and between-site gradients and it was suggested to deploy the following model:

\begin{aligned} y_{i} &\sim \text{Normal}(\mu_{i}, \sigma)\\ \mu_{i} &= \alpha_{\text{site}[i]} + \beta_2 x_{2i} && \text{stage 1, within-site gradient}\\ \alpha_{\text{site}[i]} &\sim \text{Normal}(\nu_{i}, \sigma_{\alpha})\\ \nu_i &= \alpha_1 + \beta_1 x_{1i} && \text{stage 2, between-site gradient}\\ \alpha_{1}, \beta_2, \beta_1 &\sim \text{Normal}(0, 1)\\ \sigma, \sigma_{\alpha_{1}} &\sim \text{Exponential}(1) \end{aligned}

which was coded this way in Stan:

data {
  int<lower=0> N;
  int<lower=0> N_site;
  array[N] int<lower=1> site;
  vector[N] y;
  vector[N] x1;
  vector[N] x2;

parameters {
  real alpha1;
  real beta1;
  real<lower=0> sigma_alphaS;
  real beta2;
  vector[N_site] alphaS;
  real<lower=0> sigma;

model {
  alpha1 ~ std_normal();
  beta1 ~ std_normal();
  sigma_alphaS ~ exponential(1);
  beta2 ~ std_normal();
  alphaS[site] ~ normal(alpha1 + beta1 * x1, sigma_alphaS); // stage 2, between-site gradient
  sigma ~ exponential(1);
  y ~ normal(alphaS[site] + beta2 * x2, sigma); // stage 1, within-site gradient

generated quantities {
  // posterior predictive distribution for replications y_rep of the original data set y given model parameters
  array[N] real y_rep = normal_rng(alphaS[site] + beta2 * x2, sigma);
  // pointwise log-likelihood
  vector[N] log_lik;
  for (i in 1:N) {
    log_lik[i] = normal_lpdf(y[i] | alphaS[site[i]] + beta2 * x2[i], sigma);

Are these calculations of y_rep and log_lik correct or shall they involve both “stages” (within- and between-site) of the model? Reading this post, I thought the log_lik my be instead:

for (i in 1:N) {
    log_lik[i] = normal_lpdf(y[i] | alpha1 + beta1 * x1[i] + beta2 * x2[i], sqrt(sigma^2 + sigma_alphaS^2));

but I’d like to double-check!

Probably because there are multiple valid choices when you have a hierarchical model! In general, you can group your observations and still apply LOO—it’s just that you’re leaving out a group. I would recommend @andrewgelman’s paper with Xiao-Li Meng and Hal Stern going over this for posterior predictive checks (which you are also doing): POSTERIOR PREDICTIVE ASSESSMENT OF MODEL FITNESS VIA REALIZED DISCREPANCIES on JSTOR

They make sense. Your concern in the last paragraph that you may need to apply both levels is handled by the fact that we get the multilevel uncertainty propagated down to our lower-level parameters “for free” in MCMC. Intuitively, you get the variation in alphaS[site] from sampling and then that plugs into the linear predictor for y. So you are already implicitly marginalizing out alphaS.

1 Like