Adding sampling effort biases to brms - use of 'weights' in brm

I’m using mixed model in BRMS to assess changes in species richness associated with deforestation. My data are 3-min point counts in different locations. To estimate species richness, I aggregate all the point counts of a same location. As locations are clustered within regions, I model species richness as a function of deforestation with region included as random factor. The problem is that sampling effort (i.e. the number of point counts) varies across locations. Thus, I need a model that takes into account the fact that species richness can be biased by sampling effort (species richness should increase with sampling effort). A solution I found was to fit the model as follows:

brms1 <- brm(Species.richness|weights(Sampling.effort) ~ Forest.cover + (1|Region),  
data = df,  
family=gaussian, cores = 4,   
warmup=1000, iter=2000, thin=2, chains=2, 
control = list(adapt_delta = 0.999, max_treedepth = 12)) 

Reproductible exemple

Region <-  c("Belo Horizonte", "São Paulo", "Rio de Janeiro", "Vitoria", "São Paulo", "Rio de Janeiro", "Belo Horizonte", "Rio de Janeiro", "Vitoria", "São Paulo") 
Species.richness <- c(40, 32, 24, 34, 58, 18, 28, 10, 49, 22) # Number of species recorded in the location 
Sampling.effort <- c(90, 71, 48, 82, 107, 42, 60, 20, 105, 50) # Number of point-counts used to estimate species richness in the location  
Forest.cover <- c(0.515, 0.491, 0.142, 0.374, 0.160, 0.142, 0.923, 0.693, 0.625, 0.068) # Average proportion of forest in 500m around the point counts of the location  
df <- data.frame(Region, Species.richness, Sampling.effort, Forest.cover) 

Is the implementation of the model fine? Should I transform Sampling.effort in any way? Is there a better way to do it?

  • Operating System: macOS Monterey 12.4
  • brms Version: 2.17.0


I don’t think that using weights will give you the inference that you want, but there are two techniques that potentially will:

  1. Come up with a defensible parametric form for the accumulation of sampled species as a function of effort, and model species richness that way. If the parametric form can be parameterized in terms of its asymptote, then you can treat that asymptote as a distributional parameter in a brms model. This will require implementing a custom family in brms, or implementing the model directly in Stan. In either case, you will need to write some Stan code.
  2. Given that you are working with repeat visits, use the framework of the multi-species occupancy model to explicitly account for non-detections and estimate site-specific species richness that way. The R package flocker (GitHub - jsocolar/flocker: flexible occupancy estimation in R) implements the necessary custom families so that these models can be fit as brms models.