How to generate posterior predictive distribution for hierarchical occupancy model in Stan?

Hi Vlad,

This is complicated stuff! Here are some resources that might help:
Marginalized occupancy models: a how-to for JAGS and Stan. (see especially the very last paragraph in the The marginalized JAGS/BUGS model section).
Occupancy model likelihoods: advanced topics. (see especially the whole thing!)

One major confusion here is that you need to decide two things:

  1. What are you trying to posterior-predict? The latent occupancy state? Or the observed detection/nondetection data?
  2. Do you want your predictions to condition only on the intercepts and covariates (if you have any)–that is, condition only on the linear predictors–or do you want your predictions to condition twice on the observations? I say “condition twice” because the linear predictors are estimated conditional on the data, but then once we know the linear predictors we can still condition again on the data. This is easiest to see in the event that there is at least one detection at a site. Then regardless of the linear predictor for occupancy {logit}(\psi), we will know with certainty that the true occupancy state Z is one. So do we want to condition our posterior predictions on Z = 1 or on the value of {logit}(\psi)? It turns out that for posterior predictive checking, and likelihood-based model comparison including LOO, it is very important to condition on \psi and NOT on Z = 1. To see why intuitively, notice that at every site with a detection, we know with certainty that Z = 1. If we condition on Z = 1 at all of these sites, then we will universally get the best predictive performance by estimating the lowest possible intercept for logit(\psi). Because at every site without a detection, low values of logit(\psi) maximize the likelihood. And at every site with a detection, logit(\psi) doesn’t matter if we condition on Z = 1. On the other hand, if we want to get the best possible posterior predictions for the latent occupancy state Z, we absolutely should condition in this way. This is all dealt with a bit more neatly at the two resources linked above.

Edit: If we can pin down what exactly you want to predict, I’m happy to help you with the Stan code!

3 Likes