Very interesting to learn that
brms::fitted(...) lets you obtain estimates for specific non-linear terms! I wasn’t aware of that feature, but I shouldn’t be surprised given the flexibility of the package.
It’s worth noting that the model fails once the random intercepts are added since that pinpoints it as the issue. There are two things that come to mind as possible reasons for this.
First, there’s a very general rule of thumb that you need at least 5 or 6 levels to obtain a reasonable estimate of random effects. I seem to recall that you have just 3 sites, but I might be confusing that with the number of species. So, it’s possible that the random effect is simply too unstable and introducing some unusual results in other parameters to compensate. If you inspect the output of the model, then you may notice some weird standard errors/credible bounds for some of the other non-linear terms if that’s the case.
Second, there may be an issue with the prior on standard deviation for
Tprev. On the logit scale, which is what the prior is specified to, you’re giving a 0.34 probability of the random intercept at each site being somewhere between 0 and 1, which corresponds to sites varying in prevalence between 0 and 0.73 (i.e.,
inv_logit(1) using your function). Considering that you are thinking that the overall prevalence is somewhere on the scale of 0.40 or so, you’re letting the model explore possible site-specific adjustments to true prevalence that can easily far exceed that scale. Since this prior is on the standard deviation term,
brms automatically truncates the distribution so that you do not have any prior probability of a negative standard deviation since that cannot exist. You are thus specifying a half-normal distribution at 0 with standard deviation of 1, describing how much on average the sites vary in their true prevalence. Say that the resulting standard deviation for the random intercepts is 0.50. This means that, on the logit scale, the site-specific intercepts from one another by an average of 0.50 standard units, and if we convert from this logit scale to the probability scale, then this is equivalent to saying that the average difference in site-specific contributions to true prevalence is about 0.62, which is a pretty large amount of variation relative to your expected true overall prevalence of 0.40-ish.
I think that it would be really helpful to view the prior predictive checks. You can always pull out some paper and pencil and work through possible outcomes implied by different priors, but it’s also sometimes just important to develop an intuition for what “tighter” or “looser” priors mean for a model. Using the argument
brm(..., sample_prior = "only") will give you a model where the parameters are just samples from your priors. You can then use this model with
pp_check() just like you would any other model. Since these models are computationally cheap, I’d also encourage increasing the iterations to something more than 1000 just to get more samples to visualize. You can probably play around with a whole range of priors and various values for those distributions in a relatively short period of time, and by seeing how the “estimates” from the model are altered by these different choices can help to get a better idea of why wide priors aren’t the same thing as non-informative priors.
Ultimately, it is important to also recognize that difficulties in getting a model to work can often be an indication that the model is just wrong. In this case, it seems like things are fine until that random effect is added, which may be an indication that the random intercepts constitute a misspecification to the model (or at least for the model + data in the case where there are too few levels to get good estimates of random effects). There are a few things that you can do to confirm to yourself that the model issues from including the random intercepts is a signal of misfit rather than user-error in a couple of different ways.
Probably the most robust method would be through simulation based calibration: Simulation Based Calibration for rstan/cmdstanr models • SBC. Since the data are simulated according to a known generative process, this lets you check whether the model you’re specifying would work as you want it to in the case that it is the true model summarizing the data generation process. At the same time, you could examine what happens when you fit the model to a generative process where there are no random effects associated with the sites, and you’d (hopefully) observe the same kinds of estimation errors that you’re finding here.
Another thing to consider is whether the extra variance that the random effects are trying to account for is needed. This is commonly done with the intraclass correlation in multilevel models. I believe that there is a vague rule of thumb that an ICC < 0.07-ish supports that random effects are not useful, but I wouldn’t swear to that nor would I be overly confident in making a modeling decision purely on a rule of thumb. The
performance package has an easy-to-use function to calculate the ICC from a fitted
Related to the previous point, you could also see how your model does with respect to predictive accuracy without the random effects. You can use something like
fit <- add_criterion(fit, criterion = "bayes_R2") to get a sense for how the model does without random effects in terms of explaining variance. If this number is pretty large, then the addition of random effects may not be needed.
You can also do a formal model comparison with and without random effects to see whether the extra model complexity is worth it. This is pretty straightforward with
fit_fixed <- add_criterion(fit_fixed, criterion = "waic") #can also use "loo" if needed
fit_random <- add_criterion(fit_random, criterion = "waic") #again, can also use "loo"
loo_compare(fit_fixed, fit_random, criterion = "waic") #or criterion = "loo"
There are lots of different kinds of things you can look at to help figure out whether a model fit should be trusted or not, of course. Feel free to take a look at the documentation for
tidybayes for a range of posterior predictive checks, but also don’t overlook the value of a good pairs() plot or inspection of the loo results.
I’m not familiar with this package, but looking through its documentation, I take it that the following reflects your call:
out <- truePrev(x = Count,
n = total,
SE = ~dbeta(10, 1),
SP = ~dbeta(10, 1),
prior = c(1, 1), #corresponds to beta(1, 1) on true prevalence
This model doesn’t seem entirely consistent with what you’re using here as it’s not taking into account multiple species at once. Was the motivation for the switch to
brms in order to benefit from partial pooling by analyzing the species all in one model rather than separately?