Hi,
I am trying to implement the loo package with a JAGS Binomial mixture model but, do not understand how to compute the likelihoods or translate STAN statements for JAGS.
Could someone translate these into what I need for JAGS or describe the quantities I need to derive?
Thanks
It is the same concept for JAGS or any other MCMC sampler. You need each “observation’s” contribution to the log-likelihood evaluated over all the draws of the parameters. So if your BUGS model had some likelihood such as
y[i] ~ dnorm(mu,tau)
in R, you would need to loop over a statement like this
log_Lik[ , i] <- dnorm(y[i], mu, 1 / tau^2, log = TRUE)
where mu
is a vector that contains all the MCMC draws of the conditional mean and tau
is a vector of the same size that contains all the MCMC draws of the conditional precision.
In the end, log_Lik
should be a matrix with rows equal to the number of MCMC draws and columns equal to the number of “observations” and cells containing the log-likelihood contribution for that iteration by that “observation”.
It only gets a bit tricky when the “observations” are not conditionally independent or there is some multilevel structure or something, so what you are imagining leaving out is not an isolated individual.
Thanks. That does make sense. But, how do you deal with the situation where y[I]~dbern(psi) produces 0’s where no log is possible?
This is where the Stan and BUGS languages differ a bit. In BUGS, you read
y[i] ~ dbern(psi)
as something like “in a generative model, y[i] is drawn from a Bernoulli distribution with probability psi” and BUGS tries to work out the full conditional distributions implied by the DAG for that generative process (and falls back to some non-Gibbs sampling stuff if it doesn’t not recognize a full-conditional distribution).
But in Stan
vector[N] psi = inv_logit(eta);
y ~ bernoulli(psi);
or
y ~ bernoulli_logit(eta);
reads as “add the kernel of the log-likelihood of psi onto the accumulated target
” which is why I prefer
target += bernoulli_logit(y | eta);
because it is more explicit about what Stan is doing when it goes to draw from this posterior distribution.
So, to your question
the log-likelihood is a real number, not a 0 / 1. This is actually what R does as well, when you call
dbinom(y[i], size = 1, prob = psi[i], log = TRUE)
You can have a problem where the likelihood underflows to zero, which is why it is better to keep everything in log-units by using bernoulli_logit
instead of bernoulli
. Then it should basically never be the case that the log-likelihood overflows to negative infinity.
1 Like