Hi @TBL,
For learning about how to code these models in Stan, I strongly recommend beginning with a version of the MSOM that does not involve data-augmentation with never-observed species, and thus does not involve the \Omega parameter. Without data-augmentation, the multispecies likelihood is identical to the single-species likelihood; the models differ only in their random effect structure. I wrote up a tutorial on how to code this model up here, and I think it will be clarifying for you:
https://jsocolar.github.io/marginalizedOccupancy/
Once you have this working, you might think about how to re-incorporate the data-augmented structure. However, a word of warning here: data augmentation often behaves weirdly in the presence of covariates on occupancy. One facet of the problem is this: if the random effect distributions for the species-specific slopes have even moderately large standard deviations, then the model can pack in arbitrary numbers of species in the meta-community by estimating covariate relationships such that the species are unlikely to occur on any point in your dataset. This is particularly true if you allow quadratic relationships between occupancy and covariates. (Edit: to expand on this point because it’s important, the data-augmented metacommunity model is trying to estimate the number of species that occur over some population of sites that is larger than just the sites that were actually sampled. Thus, unless you’re willing to treat the site covariates as random, the model lacks sufficient information about the characteristics of sites in the population to make an informed inference.)
Another facet of the problem is that it becomes impossible to incorporate species traits into the occupancy or detection sub-models (because we don’t have trait values for the never-observed species), but traits and trait-by-environment interactions are often highly predictive in MSOMs.
With that said, I did just push code to estimate exactly your model to the more-models branch of R package flocker. GitHub - jsocolar/flocker at more-models
This code is not well tested, but I think it should work to enable estimation of the model you seek using brms formula syntax (again, I don’t necessarily recommend fitting this model with covariates on occupancy; I wrote this code primarily to enable fitting the model with covariates on detection. If you do fit the model with covariates on occupancy, it is possible that the point-specific species richness estimates will be meaningful, but it is certain that the metacommunity-level richness estimate will not be meaningful). The new function flocker_stancode will output the generated Stan code so you can see what’s going on. The documentation of the functions flock and make_flocker_data should get you started on the right track. You may well encounter errors or bugs, and if so I’ll be happy to troubleshoot them and push fixes. This branch is experimental and is not ready for release.
Here’s some example code that should fit the data-augmented model to toy data with site- and visit-specific covariates. Note that the more-models branch requires the use of the cmdstanr backend; the default backend = "rstan" will error. (Edit: unless you install rstan from the experimental branch on GitHub)
library(flocker) # flocker must be installed from the more-models branch
# Generate example data
d <- example_flocker_data()
# Reformat example data for the data-augmented model input format
obs <- d$obs[d$unit_covs$species == "sp_1", ]
for(i in unique(d$unit_covs$species[d$unit_covs$species != "sp_1"])){
obs <- abind::abind(obs, d$obs[d$unit_covs$species == i, ], along = 3)
}
unit_covs <- d$unit_covs[d$unit_covs$species == "sp_1", c("uc1", "uc2")]
event_covs <- list()
for(i in seq_along(d$event_covs)){
event_covs[[i]] <- d$event_covs[[i]][d$unit_covs$species == "sp_1", ]
}
names(event_covs) <- names(d$event_covs)
# Run the flocker pipeline
fd <- make_flocker_data(
obs, unit_covs, event_covs,
type = "augmented", n_aug = 100)
mod <- flock(~ uc1, ~ uc1 + ec1,
fd, augmented = T,
backend = "cmdstanr"
)