In order to use loo
, I want to calculate the pointwise log-likelihood in a two-staged multilevel model; however, I am not sure about the correct way to do it.
Let’s take the example from Overcoming the Simpson's paradox in a multivariate linear model with within and between site gradients and repeated measurements. The data contains within- and between-site gradients and it was suggested to deploy the following model:
which was coded this way in Stan:
data {
int<lower=0> N;
int<lower=0> N_site;
array[N] int<lower=1> site;
vector[N] y;
vector[N] x1;
vector[N] x2;
}
parameters {
real alpha1;
real beta1;
real<lower=0> sigma_alphaS;
real beta2;
vector[N_site] alphaS;
real<lower=0> sigma;
}
model {
alpha1 ~ std_normal();
beta1 ~ std_normal();
sigma_alphaS ~ exponential(1);
beta2 ~ std_normal();
alphaS[site] ~ normal(alpha1 + beta1 * x1, sigma_alphaS); // stage 2, between-site gradient
sigma ~ exponential(1);
y ~ normal(alphaS[site] + beta2 * x2, sigma); // stage 1, within-site gradient
}
generated quantities {
// posterior predictive distribution for replications y_rep of the original data set y given model parameters
array[N] real y_rep = normal_rng(alphaS[site] + beta2 * x2, sigma);
// pointwise log-likelihood
vector[N] log_lik;
for (i in 1:N) {
log_lik[i] = normal_lpdf(y[i] | alphaS[site[i]] + beta2 * x2[i], sigma);
}
}
Are these calculations of y_rep
and log_lik
correct or shall they involve both “stages” (within- and between-site) of the model? Reading this post, I thought the log_lik
my be instead:
for (i in 1:N) {
log_lik[i] = normal_lpdf(y[i] | alpha1 + beta1 * x1[i] + beta2 * x2[i], sqrt(sigma^2 + sigma_alphaS^2));
}
but I’d like to double-check!