Multi-species occupancy - no detection history

If you have multiple records of a species at a site across multiple rows, these ‘duplicates’ will provide additional information towards the parameter estimates for each environmental predictor, regardless of whether site identity is included in the model or not.

Regarding site identity, I think the most important consideration (before even considering fancy latent variable structures over say, a simple species by site intercept) is the task you’re planning to put the model to. If you’re only interested in the sites that you have already have data for, then including additional parameters that can capture site-specific variation could potentially improve the precision of your model. However, the trade-off is generalisability - it’s harder to extend the predictions beyond the dataset as it requires marginalising out the site-specific variables, which is not necessarily easy, especially for non-Gaussian likelihoods. Thus, if your aim is to predict occupancy across, say, a landscape, you’re probably better off without site-specific terms, but the additional records will still inform your model.

2 Likes