Good day,
I’m excited to announce the release of my first R package, occARU, which fits (multispecies) occupancy models for automated recording unit (ARU) data, such as camera traps and acoustic monitors, in Stan. After appropriately thinning species detections and aggregating to surveys of arbitrary lengths, occARU fits occupancy models with count observation models to go beyond binary detection probabilities. The package is somewhat opinionated, with most of the focus on the detection rates instead of occupancy.
A full description of the model can be found in the model vignette, but some of the main features are highlighted below:
- Hierarchical multispecies Gaussian processes (GPs) using matrix normals for site (spatial) and survey (temporal) random effects. If length scales are shared across species, only a single Cholesky decomposition is required for each GP.
- Orthogonal projection of the random effects ensures recovery of fixed effects.
- Fixed effects can include continuous, categorical (with zero-sum vectors), and ordinal (with simplex decomposition) predictors. The detection submodel accommodates both site and site-by-survey varying predictors.
- Interspecific correlation matrices to explore species interactions in different model components (responses to predictors, and spatial and temporal effects).
- Global-local shrinkage priors are used for the occupancy and detection submodels. Inspired by R2D, priors are set on the explained variance which are simplex decomposed using either Dirichlet or zero-sum logistic-normal distributions.
- Monte Carlo integration (thanks @avehtari) is used to produce a second
log_lik object to be used for PSIS-LOO-CV. Log likelihoods are stored for each site and species combination, and random effects at this “observation-level” is known to be problematic for loo. The Monte Carlo approach isn’t perfect, but it does reduce a lot of the high Pareto k observations.
I’d like to give special thanks to the Stan developers, with the underlying Stan programs showcasing some recent additions to the Stan langauge:
sum_to_zero_vector and sum_to_zero_matrix for all random effects for identifiability;
- Ragged arrays of zero-sum vectors and simplexes using the
sum_to_zero_jacobian and simplex_jacobian functions;
- Pathfinder to set initial values;
- tuples to return multiple outputs in log likelihood functions;
- Within-chain parallelisation, chunking over sites to increase speed;
bayesplot::ppc_rootogram_grouped() will be useful for posterior predictive checking across species.
I hope some of you will find this useful.
Cheers,
Matt
Thanks for sharing and calling out specific tools. I’m pinging @spinkney and @aseyboldt as they did the sum-to-zero work. I’m also glad to see Pathfinder’s useful here—I’d say our success with it has been mixed overall.
I have a soft spot for this model class as I’m the one who first coded occupancy models in Stan way back in 2014 after hanging out with folks in Melbourne’s stats ecology community during a month long visit. The Cormack-Jolly-Seber models were also an early success coded with @syclick and Fränzi Koerner-Nievergelt of the Swiss Ornithological Institute. The ecology modeling community is so much fun that I went to ISEC in Seattle many years ago without a paper just to soak up the community. Movement HMMs are also super cool and can be fit in Stan.
Did you see my recent post about capture-recapture models? I went deep on these models and provided a bunch of Stan implementations. Actually, that reminds me—the BPA translation on the Stan Github doesn’t recover latent states correctly for Jolly-Seber models. It does posterior prediction of latent states, instead of doing forward-backward sampling.
Thanks for the pointer—no, I hadn’t seen that.
I don’t know what “BPA” is, but if you send me a link or open an issue, I can take a look. Or even better, just submit a patch yourself if it’s straightforward.
I know the forward/backward algorithm in EM for calculating marginal distributions over states, but that’s not what we’d want to do in a Bayesian posterior. Instead, we want the joint posterior over states for any downstream inference, not independent marginals. Marginalizing the joint posterior should give you the same result as using the backward algorithm, but with more Monte Carlo noise.
Hey Bob, sorry, it’s Bayesian Population Analysis, but it looks like the repo has been updated to show that it’s the posterior predictive distribution, not the posterior distribution of the underlying latent states.
I know the forward/backward algorithm in EM for calculating marginal distributions over states, but that’s not what we’d want to do in a Bayesian posterior.
Indeed, I’ve called his “forward-backward sampling” following @jsocolar, but I’m not sure where that term originates. This file does it for all of the Jolly-Seber models covered in the manuscript, and the SBC results suggest it correctly recovers the latent states.
Edit: ignore the below. It appears the term “forward filtering backward sampling” was in widespread use and is not a GPT invention. However, the fact that I first heard this term from ChatGPT and shortened in code to forward_backward_sampling() for a less verbose function name is still correct.
THE BELOW IS NOT ACCURATE:
I don’t know whether the origin story is amusing or terrifying. A couple of years ago I was using an early version of ChatGPT to check my work in creating a function for recovering (the posterior distribution of) latent state trajectories from hidden markov models. The idea behind these functions is to go though the forward pass just like in the forward-backward algorithm for marginalizing over the latent state sequence, but then in the backward pass we sample the latent states as we go. ChatGPT decided to call this a “forward filtering backward sampling” algorithm, which it contrasted with a “forward filtering backward smoothing” algorithm that is exactly what is commonly known as the “forward-backward algorithm”.
This terminology made sense to me, but was too verbose to be the name of a function, so in code I shortened it to forward_backward_sampling(), and of course I kept the usual forward_backward() for the smoothing version. To this day I haven’t heard another “more official” name for the “forward filtering backward sampling” algorithm, though I presume that one exists. To the best of my knowledge “forward filtering backward sampling” was a GPT original.
In the mean time, it seems like “forward backward sampling” is gaining some traction among practitioners. I like this name because it clarifies that this is really little more than a variant of the forward-backward algorithm where we collapse the posterior uncertainty to a single sampled state sequence during the backward pass.
Nice! I like the terminology as well and the fact that we need to use _rng suffix for any functions using this further makes it clear. I think ideally I wish it was just called the “forward-backward” algorithm, though. Just like the forward algorithm takes in the full Bayesian uncertainty by default, analogously the only “correct” way to do “forward-backward” algorithm is to sample the states for each step. The classic smoothing approach is really the anomaly that should have a different name, IMO.