Are you sure that the random effect prior is specified correctly? Since this is the piece of the model that causes the convergence issues when added, it makes sense to start checking whether there is an issue with its specification.
Currently, you have a uniform prior over the interval [0, 1] on the standard deviation of the random effects. That is to say, you are saying that the average variability in the random intercepts between sites can be anything from 0 to 1. This seems very unlikely to me. In such situations, I find it helpful to think about what the prior implies. You can do this computationally within brms
by setting the argument brm(..., sample_prior = "only")
so that the model does not evaluate the data but instead just generates the parameter estimates from the priors. You can then visualize what kinds of predictions are consistent with your priors by using the pp_check()
function as you would for a posterior predictive check (just in this case you’re getting the prior predictive check). Comparing your observed data to the possible forms of data implied by your priors can help to give you an idea of whether your overall specification of the model is reasonable or whether certain priors may be under-/over-informative in ways you were not anticipating. These are relatively quick models to run, so you can try a variety of different priors to build an intuition for how changes to one part of the model affect it overall.
Analytically, you can always take the extremes implied by your prior and figure out what your data would have to be like in order to obtain those extremes. If the standard deviation of the random intercepts were 0, then this means that average prevalence of things is the same across all sites. Such a finding would indicate that a random effects model provides no additional benefit as all of the variance in the outcome can be explained from the fixed effects components of the model. In contrast, to get a standard deviation in the intercepts of sites to be 1, then you effectively have to have the random intercepts all be either -1 or 1 (and even then, you will can standard deviation estimates that are actually larger than 1). This outcome is impossible. To flip the previous example, this would effectively arise in the case where the fixed components give no information at all and result in a starting guess of 0 for the prevalence. The site location would then determine whether the prevalence is 1 or… -1, which is obviously not a valid estimate of prevalence.
Thinking about the random intercept as a site-specific deviation from the predicted prevalences of the fixed effects, I’d think you need a prior that reflects the fact that you can be reasonably sure beforehand that values of 0 and 1 are not very likely. In fact, I’d think you could go even further and specify a prior that reflects the fact that you would be surprised if the standard deviation of the site-specific intercepts was greater than around 0.50. I admit that I’ve never seen a prior on the variation of random effects that was beta (since this imposes a hard constraint that the standard deviation cannot be any larger than 1), I think it could be sensible here. While a flat prior, the beta(1, 1)
distribution is not always uninformative, particularly if there is limited data available (in this case, it depends on the number of sites you have). The maximum entropy and therefore least informative beta distribution is beta(2, 2)
, which places the greatest uncertainty around 0.50 but also reflects the greatest skepticism for 0 and 1. You may try fitting the model using this weakly informative prior, or you could also try a more informative one (e.g., beta(2, 11)
, note that I’d avoid specifying a beta prior with the alpha shape parameter of 1 in this case as that gives increasing probability to 0 where \alpha > 1 places effectively zero probability to this extreme). I’d hold off on the more informative options until it’s clear that the model requires it, which it may depending on the amount of data that you have.
Also, in case it helps at all, you can also think about the beta priors as counts of successes and failures. So, for your sensitivity and specificity priors, the beta(10, 1)
distribution reflects the equivalent belief of observing 11 total events in which 10 were successes and just 1 was a failure. By extension, the beta(1, 1)
would be the equivalent of having observed two events where one result was a success and the other a failure. This is the most skeptical position that we can be in as we expect the probability of success to be 0.50 (which is indeed the mean of the beta(1, 1)
distribution) but we have no way of determining from just two events whether what we observed was a fluke or an actual reflection of the fact that there’s a probability of 0.50 for a success, which is why the distribution reflects any possible probability as equally likely even though we might be expecting for it to be somewhere around 0.50. Adding one additional observation, regardless of the outcome, shifts this belief and makes the distribution reflect this change. One way to think of beta(2, 2)
then is as the least number of observations (n = 4) required for us to be skeptical of certainty in either direction (i.e., p = 0 or p = 1) while maintaining an expectation of the outcome being totally determined by chance. This skepticism for extreme values in conjunction with the expectation of something closer to 50-50 is an intuitive way, at least for me, to think about why a flat prior like beta(1, 1)
is not the same thing as an uninformative prior: saying that any probability from 0 to 1 is equally likely is actually quite informative compared to the beta(2, 2)
which reflects that probabilities very close to certainty are unlikely and that values close to chance are more likely. From a modeling perspective, the beta(2, 2)
just makes sense: why would we model something if we believed that there was a reasonable chance that the outcome is guaranteed and does not vary? In other words, we don’t find ourselves often asking research questions that require us to predict the number of times the sun will rise over a two-week period or the number of people whose symptoms improved in a drug trial after experiencing a fatal side-effect. Instead, we’re likely to ask questions where there is a pretty large amount of uncertainty, which makes a weakly informative prior a better starting point.