# Bayesian multi-level (partial pooling) model -- multiple likelihood statements

From a tutorial a came across, the author has two likelihood statements in the model block; I’m curious if this is (1) incorrect, (2) works but not the cleanest/best way to specify a model or (3) the right way to do things in Stan.

The Stan code:

data {
int n; // number of observations
int n_county; // number of counties
vector[n] vfloor;
int<lower = 0, upper = n_county> county[n];
}

parameters {
vector[n_county] alpha; // vector of county intercepts
real beta; // slope parameter
real<lower = 0> sigma_a; // variance of counties
real<lower = 0> sigma_y; // model residual variance
real mu_a; // mean of counties
}

model {
// conditional mean
vector[n] mu;

// linear combination
mu = alpha[county] + beta * vfloor;

// priors
beta ~ normal(0, 1);

// hyper-priors
mu_a ~ normal(0, 1);
sigma_a ~ cauchy(0, 2.5);
sigma_y ~ cauchy(0, 2.5);

// level-2 likelihood
alpha ~ normal(mu_a, sigma_a);

// level-1 likelihood
log_radon ~ normal(mu, sigma_y);
}

generated quantities {
vector[n] log_lik; // calculate log-likelihood
vector[n] y_rep; // replications from posterior predictive distribution

for (i in 1:n) {
// generate mpg predicted value
real log_radon_hat = alpha[county[i]] + beta * vfloor[i];

// calculate log-likelihood
// normal_lpdf is the log of the normal probability density function

// generate replication values
y_rep[i] = normal_rng(log_radon_hat, sigma_y);
// normal_rng generates random numbers from a normal distribution
}
}


I’m also unclear on why the author is defining log likelihood statements in the generating quantities portion. I had believed this section is primarily deterministic operations on the inferred variables and isn’t, itself, “part of the model”. Is this correct?

Hello @Jacob_Moore. I saw the tutorial you have as a basis.
I will try to explain your doubts from my perspective. ok.
If I’m not mistaken, the simplified model that is in the Stan code provided is:

y \sim N(\mu_y \; , \; \sigma_{y})
\mu_y = \alpha_i + \beta \cdot x
\alpha_i \sim N(\mu_{\alpha} \; , \; \sigma_{\alpha})

In my view, two likelihoods are not declared in the model block. Only one model is declared, which is the one shown above. Where y is the dependent variable with normal distribution, \mu is the linear model, x the predictor, \sigma_y the variance of y; \alpha_i and \beta the model parameters; i is the index for the countries in the model. A particularity is the \alpha_i parameter. It belongs to another normal probability distribution in which it has a mean \mu_{\alpha} and variance \sigma_{\alpha}. This implies that there is some between-countries variability in the individual-level intercept. This simply means that each country will provide a different intercept from a normal probability distribution.

From a tutorial a came across, the author has two likelihood statements in the model block; I’m curious if this is (1) incorrect, (2) works but not the cleanest/best way to specify a model or (3) the right way to do things in Stan.

Answering your question: (1) I believe it is not incorrect. (2) Because it is a simple model, I believe it is a reasonable way to write the stan model. (3) I believe the Stan team can answer this question if there is a better way to write this model in stan.

I’m also unclear on why the author is defining log likelihood statements in the generating quantities portion. I had believed this section is primarily deterministic operations on the inferred variables and isn’t, itself, “part of the model”. Is this correct?

Defining the log-likelihood in the generating quantities block is a common practice in the Stan language. It is very useful in many ways, mainly for model diagnostics with the loo package. You can read here that:

“The generated quantities program block is rather different than the other blocks. Nothing in the generated quantities block affects the sampled parameter values. The block is executed only after a sample has been generated.”

I really like Bruno Nicenboim, Daniel Schad, and Shravan Vasishth’s book (An Introduction to Bayesian Data Analysis for Cognitive Science). The example in chapter 5.2 will help you a lot. I highly recommend reading this chapter in your studies.

Hope this helps.

2 Likes

this is pretty standard.

the generated quantities block is being used to report the log likelihood, based on the estimated parameters for that draw.

and y_rep is giving you the posterior predictive distribution: 27.5 Posterior prediction for regressions | Stan User’s Guide

here’s another version of this tutorial - doing pretty much the same thing - Multilevel Modeling