Brief background
Occupancy models simultaneously estimate the probabilities that a species occupies each of a collection of site based on detection/nondetection data from biological surveys. Conceptually, the model is formulated as a pair of logistic regressions. Across sites, the site occupancy Z (a latent binary state), is regressed on covariates whose linear combination gives the logit of the predicted occupancy probability \psi. Conditional on Z==1
, the observed detection/non-detection data are also regressed on covariates whose linear combination gives the logit predicted detection probability \theta. \psi and \theta identifiable because the sites are visited repeatedly during a period over which Z is assumed not to change. The outcomes of sampling on the visits are assumed to be conditionally independent. We implement the model by marginalizing over the latent state Z. See here for more details about the model and the marginalization.
My question
I am doing some posterior predictive checking on an occupancy model, and it seems to me that the marginalized and non-marginalized specifications actually lead to substantially different posterior predictive distributions (see below)! Iâm personally surprised to realize this, if only because I had kinda conceptualized marginalization as nothing but a tool to make models easier to sample, not as a decision about my posterior predictive distribution. I guess my question is whether anybody has some interesting or useful comments about when it might be preferable to use one distribution or the other. I imagine that the first distribution (the one that arises from the marginalized model) should be more sensitive for diagnosing model misspecification, since the posterior predictive distribution is less tightly tied to the observed data.
Posterior predictive distribution in the marginalized model
Consider the posterior predictive distribution for obtaining at least one detection at site i. This is given by Bernoulli(\psi_i*(1-\prod_{j=1}^{n}(1-\theta_{ij}))), where \psi is the modeled occupancy probability, \theta is the modeled conditional detection probability, and j indexes the visit.
Posterior predictive distribution in the non-marginalized model
In the non-marginalized model, the posterior predictive distribution is different for points with at least one detection in the data versus points with no detections. With at least one detection at point i, the posterior distribution for Z_i is equal to one with probability 1, and the posterior predictive distribution of interest is just Bernoulli(1-\prod_{j=1}^{n}(1-\theta_{ij})) (note the lack of dependence on \psi).
For points with no detections, Z_i is equal to one with probability \frac{\psi_i*\prod_{j=1}^{n}(1-\theta_{ij})}{ (1-\psi_i) + \psi_i*\prod_{j=1}^{n}(1-\theta_{ij})}. So the posterior predictive distribution for obtaining at least one detection at a point is then Bernoulli(\frac{\psi_i*\prod_{j=1}^{n}(1-\theta_{ij})}{\psi_i*\prod_{j=1}^{n}(1-\theta_{ij}) + (1-\psi_i)}*(1-\prod_{j=1}^{n}(1-\theta_{ij}))) Hopefully I got that right, anyway.