Interpretting the generated quantities of `binomial_lpmf()`

KleinGordon · February 26, 2022, 2:50pm

Hi all.

I was playing around with a binomial model with a beta prior and am interested in calculating the evidence of an observation by marginalising out the probability parameter. I was trying this out on a more complicated model but reduced it to the very simple case below:

data {
  int<lower=0> N;
  int<lower=0> y;
  real<lower=0> alpha;
  real<lower=0> beta;
}

parameters {
  real<lower=0, upper=1> p;
}

model {
  p ~ beta(alpha, beta);
}

generated quantities{
  real<upper=0> log_lik;{
    log_lik = binomial_lpmf(y | N, p);
  }
}

which can be run using

library(cmdstanr)

likelihood_model <- cmdstan_model(stan_file='stan_model.stan')
data = list(
  N=100,
  y=40,
  alpha=10,
  beta=10
)
model_fit = likelihood_model$sample(
  data=data,
  chains=4,
  parallel_chains = 4,
  iter_warmup=500,
  iter_sampling=1000
)
model_fit

My question is about the final bracket:

generated quantities{
  real<upper=0> log_lik;{
    log_lik = binomial_lpmf(y | N, p);
  }
}

What exactly am I calculating here if I define a prior over the parameter p? My initial interpretation would be that it would marginalise over my p prior and estimate the evidence, i.e. it is calculating:

\log\left(P(Y | N, \alpha, \beta)\right) = \log\left[\int_{0}^{1} P(Y|N, p)P(p|\alpha, \beta) dp \right]

where P(Y|N,p) \sim \text{Bin}(Y|N, p) and P(p|\alpha, \beta) ~ \text{Beta}(\alpha, \beta).

Given that Y, N, \alpha and \beta are all constants, I would have assumed that log_lik would have returned a fixed value, but looking at the model output I have not generated a single-point value:

> model_fit
 variable   mean median   sd  mad     q5    q95 rhat ess_bulk ess_tail
  lp__    -14.40 -14.11 0.76 0.33 -15.90 -13.87 1.00     1894     2009
  p         0.49   0.49 0.11 0.12   0.31   0.68 1.00     1628     1676
  log_lik  -6.97  -4.74 5.90 3.06 -18.68  -2.53 1.00     1704     1676

Can anyone enlighten me as to what log_lik is the log’d distribution of?

mike-lawrence · February 28, 2022, 1:31pm

Hm, I’m not mathematically astute enough to be commenting with confidence on marginalization, but as you have written your simplified model, p is a parameter with 0-to-1 bounds (so, a probability) on which you express a beta prior. Your model block does not then include p on the right hand side of any ~ expressions with data on the left hand side, so after sampling the values of p will reflect samples from the prior. In that case, your computation of log_lik in the generated quantities block will reflect the log of the probability mass of the data given a sample of p from it’s prior and the post-sampling distribution of log_lik will reflect the distribution of probability mass of the data given the prior distribution on p, which is the likelihood part of the prior \times likelihood part of the Bayes’ formula.

KleinGordon · February 28, 2022, 3:25pm

Thanks very much for the comment Mike, I think this makes sense now. I suppose the relation between the log_lik outcome and the marginalised evidence is that the expectation of the exp(log_lik) samples is the Monte-Carlo estimate of the expectation of the likelihood with respect to the distribution of the P(p) (which simply happens to be is the marginal evidence itself!). Cheers.

Topic		Replies	Views
Replicating bernoulli_logit_lpmf in R Modeling	2	402	August 11, 2021
Mixture models: how to get P(data \| cluster) Modeling	13	2418	May 29, 2017
Generated quantities from custom probability function Modeling	3	386	September 12, 2023
Calculating log likelihood for hierarchical negative binomial model Modeling	6	1773	December 11, 2018
Simple code for prediction using generated quantities Modeling	4	550	September 8, 2018

Interpretting the generated quantities of `binomial_lpmf()`

Related topics