Calculating the pointwise log-likelihood in a two-staged multilevel model

Unlearned5341 · December 12, 2023, 6:11pm

In order to use loo, I want to calculate the pointwise log-likelihood in a two-staged multilevel model; however, I am not sure about the correct way to do it.

Let’s take the example from Overcoming the Simpson's paradox in a multivariate linear model with within and between site gradients and repeated measurements. The data contains within- and between-site gradients and it was suggested to deploy the following model:

\begin{aligned} y_{i} &\sim \text{Normal}(\mu_{i}, \sigma)\\ \mu_{i} &= \alpha_{\text{site}[i]} + \beta_2 x_{2i} && \text{stage 1, within-site gradient}\\ \alpha_{\text{site}[i]} &\sim \text{Normal}(\nu_{i}, \sigma_{\alpha})\\ \nu_i &= \alpha_1 + \beta_1 x_{1i} && \text{stage 2, between-site gradient}\\ \alpha_{1}, \beta_2, \beta_1 &\sim \text{Normal}(0, 1)\\ \sigma, \sigma_{\alpha_{1}} &\sim \text{Exponential}(1) \end{aligned}

which was coded this way in Stan:

data {
  int<lower=0> N;
  int<lower=0> N_site;
  array[N] int<lower=1> site;
  vector[N] y;
  vector[N] x1;
  vector[N] x2;
}

parameters {
  real alpha1;
  real beta1;
  real<lower=0> sigma_alphaS;
  real beta2;
  vector[N_site] alphaS;
  real<lower=0> sigma;
}

model {
  alpha1 ~ std_normal();
  beta1 ~ std_normal();
  sigma_alphaS ~ exponential(1);
  beta2 ~ std_normal();
  alphaS[site] ~ normal(alpha1 + beta1 * x1, sigma_alphaS); // stage 2, between-site gradient
  sigma ~ exponential(1);
  y ~ normal(alphaS[site] + beta2 * x2, sigma); // stage 1, within-site gradient
}

generated quantities {
  
  // posterior predictive distribution for replications y_rep of the original data set y given model parameters
  array[N] real y_rep = normal_rng(alphaS[site] + beta2 * x2, sigma);
  
  // pointwise log-likelihood
  vector[N] log_lik;
  for (i in 1:N) {
    log_lik[i] = normal_lpdf(y[i] | alphaS[site[i]] + beta2 * x2[i], sigma);
  }
  
}

Are these calculations of y_rep and log_lik correct or shall they involve both “stages” (within- and between-site) of the model? Reading this post, I thought the log_lik my be instead:

for (i in 1:N) {
    log_lik[i] = normal_lpdf(y[i] | alpha1 + beta1 * x1[i] + beta2 * x2[i], sqrt(sigma^2 + sigma_alphaS^2));
  }

but I’d like to double-check!

Bob_Carpenter · December 13, 2023, 10:02pm

Probably because there are multiple valid choices when you have a hierarchical model! In general, you can group your observations and still apply LOO—it’s just that you’re leaving out a group. I would recommend @andrewgelman’s paper with Xiao-Li Meng and Hal Stern going over this for posterior predictive checks (which you are also doing): POSTERIOR PREDICTIVE ASSESSMENT OF MODEL FITNESS VIA REALIZED DISCREPANCIES on JSTOR

They make sense. Your concern in the last paragraph that you may need to apply both levels is handled by the fact that we get the multilevel uncertainty propagated down to our lower-level parameters “for free” in MCMC. Intuitively, you get the variation in alphaS[site] from sampling and then that plugs into the linear predictor for y. So you are already implicitly marginalizing out alphaS.

Topic		Replies	Views
Defining the log_lik for a multi-level model Modeling rstan , specification , loo	6	656	December 17, 2020
Calculate pointwise log-likelihood in multilevel models with measurement error and identifiability hacks Modeling techniques , specification , loo	3	470	July 20, 2023
Computing pointwise log-likelihood for a binomial normal hierarchical model Modeling loo	6	2035	June 11, 2017
Pointwise log-likelihood for Mark-Recapture Model Modeling specification , ecology , capture-recapture	1	903	April 28, 2019
Extracting point-wise log-likehood from multivariate normal model General	6	691	April 12, 2023

Calculating the pointwise log-likelihood in a two-staged multilevel model

Related topics