Marginalizing occupancy models for Stan implementation

Hi all–my first post here. I’ve written up a document for non-experts like myself to better understand how marginalization works in the context of the site-occupancy model for biological species surveys. Ecologists are more familiar with a parameterization that explicitly models a latent binary occupancy state; this state needs to be marginalized out in order to fit the model in Stan.

The Stan Case Studies include a Stan translation of a more complicated occupancy model. This case study relies on a lack of visit-specific covariates to do the marginalization, and I think many researchers in my community would find it difficult to make the jump from the case study to other classes of occupancy model.

Feel free to contact me with corrections or comments. Likewise, if there is anywhere I should post this to make it more accessible to potential Stan users, I’m happy to do so.

Jacob Socolar


As an ecologist who has mostly avoided Stan because of a lack of time to learn to marginalize and efficient Stan scripting. Thanks. One suggestion would be to report effective samples per minute, including for the Stan model. This will be even more impressive than just the general speedup. It will also be neat to use the new ESS calculations and for the 95% credible intervals of the slowest converging parameter. This is likely where marginalization and especially Stan will shine.

I also find the latent binary parameter formulation natural statistically. We’re not trying to hide that or get rid of it. We just want to compute it robustly and efficiently.

The only robust way to compute posteriors that are at all complicated is to marginalize out any latent binary parameters and then do inference with them in expectation. I tried to explain how to do this and why it’s so much more efficient for a very simple model in the very first example in the latent discrete parameters chapter of the user’s guide. That’s true even if you’re working in BUGS or JAGS—they’re terrible at sampling discrete parameters because nobody knows how to sample discrete parameters effectively (you can tell by looking at poor mixing in simulated data; the problem is lack of gradient information to guide sampling). There’s a neat paper explaining this that also considers the marginalized JAGS/BUGS models:

Luckily, before MCMC, ecologists used EM to fit maximum marginal likelihood estimates. EM requires exactly the same marginalizations of discrete parameters. So all you need to do is go back to the 1980s literature and you can find all the marginalizations. Alas, this doesn’t make them any easier to understand.

If you want to see this in practice, Hiroki Îto has translated the examples from the Kery and Schaub Bayesian Population Analysis book.