# Logit transformation of parameters in non-linear binomial model

I constructed a non-linear binomial model using brms. It estimates true prevalence based on apparent prevalence, the diagnostic test sensitivity and specificity. I use beta priors and identity link, because that way the prior specification is straight-forward. The model works fine, but I want to add a random factor ‘site’ because of pseudoreplication. I looked up this topic: Random-effect specifications for non-linear model,
which seems to be exactly what I need.
A beta prior on the random factor gives me >900 divergent transitions, so I want to try the logit transformation. Unfortunately, I’m a Stan/brms newbie and don’t understand, how to implement it in the model and how to specify the priors after the transformation.

The model with beta prior for site:

mod <- brms(
bf(Count | trials(total) ~ Tprev * Se + (1 - Tprev ) * (1 - Sp),
Tprev ~ species + (1|site),
Se + Sp ~ 1,
nl = T), family = binomial(link = 'identity'), data = data, chains = 1, prior = pr, control = list(

# priors:
pr <- c( set_prior('beta(1,1)', nlpar = 'Tprev ', ub = 1, lb = 0),
set_prior('beta(1,1)', class = 'sd', nlpar = 'Tprev ', ub = 1, lb = 0),
set_prior('beta(10,1)', class = 'b', nlpar = 'Se', ub = 1, lb = 0),
set_prior('beta(10,1)', class = 'b',  nlpar = 'Sp', ub = 1, lb = 0))


Cheers!

Are you sure that the random effect prior is specified correctly? Since this is the piece of the model that causes the convergence issues when added, it makes sense to start checking whether there is an issue with its specification.

Currently, you have a uniform prior over the interval [0, 1] on the standard deviation of the random effects. That is to say, you are saying that the average variability in the random intercepts between sites can be anything from 0 to 1. This seems very unlikely to me. In such situations, I find it helpful to think about what the prior implies. You can do this computationally within brms by setting the argument brm(..., sample_prior = "only") so that the model does not evaluate the data but instead just generates the parameter estimates from the priors. You can then visualize what kinds of predictions are consistent with your priors by using the pp_check() function as you would for a posterior predictive check (just in this case you’re getting the prior predictive check). Comparing your observed data to the possible forms of data implied by your priors can help to give you an idea of whether your overall specification of the model is reasonable or whether certain priors may be under-/over-informative in ways you were not anticipating. These are relatively quick models to run, so you can try a variety of different priors to build an intuition for how changes to one part of the model affect it overall.

Analytically, you can always take the extremes implied by your prior and figure out what your data would have to be like in order to obtain those extremes. If the standard deviation of the random intercepts were 0, then this means that average prevalence of things is the same across all sites. Such a finding would indicate that a random effects model provides no additional benefit as all of the variance in the outcome can be explained from the fixed effects components of the model. In contrast, to get a standard deviation in the intercepts of sites to be 1, then you effectively have to have the random intercepts all be either -1 or 1 (and even then, you will can standard deviation estimates that are actually larger than 1). This outcome is impossible. To flip the previous example, this would effectively arise in the case where the fixed components give no information at all and result in a starting guess of 0 for the prevalence. The site location would then determine whether the prevalence is 1 or… -1, which is obviously not a valid estimate of prevalence.

Thinking about the random intercept as a site-specific deviation from the predicted prevalences of the fixed effects, I’d think you need a prior that reflects the fact that you can be reasonably sure beforehand that values of 0 and 1 are not very likely. In fact, I’d think you could go even further and specify a prior that reflects the fact that you would be surprised if the standard deviation of the site-specific intercepts was greater than around 0.50. I admit that I’ve never seen a prior on the variation of random effects that was beta (since this imposes a hard constraint that the standard deviation cannot be any larger than 1), I think it could be sensible here. While a flat prior, the beta(1, 1) distribution is not always uninformative, particularly if there is limited data available (in this case, it depends on the number of sites you have). The maximum entropy and therefore least informative beta distribution is beta(2, 2), which places the greatest uncertainty around 0.50 but also reflects the greatest skepticism for 0 and 1. You may try fitting the model using this weakly informative prior, or you could also try a more informative one (e.g., beta(2, 11), note that I’d avoid specifying a beta prior with the alpha shape parameter of 1 in this case as that gives increasing probability to 0 where \alpha > 1 places effectively zero probability to this extreme). I’d hold off on the more informative options until it’s clear that the model requires it, which it may depending on the amount of data that you have.

Also, in case it helps at all, you can also think about the beta priors as counts of successes and failures. So, for your sensitivity and specificity priors, the beta(10, 1) distribution reflects the equivalent belief of observing 11 total events in which 10 were successes and just 1 was a failure. By extension, the beta(1, 1) would be the equivalent of having observed two events where one result was a success and the other a failure. This is the most skeptical position that we can be in as we expect the probability of success to be 0.50 (which is indeed the mean of the beta(1, 1) distribution) but we have no way of determining from just two events whether what we observed was a fluke or an actual reflection of the fact that there’s a probability of 0.50 for a success, which is why the distribution reflects any possible probability as equally likely even though we might be expecting for it to be somewhere around 0.50. Adding one additional observation, regardless of the outcome, shifts this belief and makes the distribution reflect this change. One way to think of beta(2, 2) then is as the least number of observations (n = 4) required for us to be skeptical of certainty in either direction (i.e., p = 0 or p = 1) while maintaining an expectation of the outcome being totally determined by chance. This skepticism for extreme values in conjunction with the expectation of something closer to 50-50 is an intuitive way, at least for me, to think about why a flat prior like beta(1, 1) is not the same thing as an uninformative prior: saying that any probability from 0 to 1 is equally likely is actually quite informative compared to the beta(2, 2) which reflects that probabilities very close to certainty are unlikely and that values close to chance are more likely. From a modeling perspective, the beta(2, 2) just makes sense: why would we model something if we believed that there was a reasonable chance that the outcome is guaranteed and does not vary? In other words, we don’t find ourselves often asking research questions that require us to predict the number of times the sun will rise over a two-week period or the number of people whose symptoms improved in a drug trial after experiencing a fatal side-effect. Instead, we’re likely to ask questions where there is a pretty large amount of uncertainty, which makes a weakly informative prior a better starting point.

2 Likes

Thank you for the answer wgoette, it really helped me understand what is beta prior exactly! beta(2,2) seems more resonable than beta(1,1) in my case. I played with different priors for but i cannot use
pp_check() because of errors (NAs in predictions). Every set of beta priors that I used resulted in >900 divergent transitions, so I opted for the logit transformation of the Tp parameter.

I have found a post on stack exchange which uses inv_logit fuction in the model syntax, hopefully applied correctly.

I have one more thing which confuses me. In:

Tprev ~ species + (1|site)


the intercept is by default the first species (out of three species in total). Since the true prevalence can vary depending on the site for each species, shouldn’t the correct syntax be:

Tprev ~ species + (1 + species|site)


or

Tprev ~ species + 0 + (0 + species|site)


since I assume there is no other variability to the True prevalence than what is in the species factor?

It’s generally a reasonable position to include as many variables as is feasible/theory-consistent as random effects, so I don’t see any reason why giving species a random slope by site would be a bad thing (assuming that you have the data for it). The model would now give you the average prevalence of each species via the fixed effects in addition an overall mean of prevalences associated with each site (implying that some sites are just more inhabited/active or were more easily sampled) and then a further adjustment for the overall prevalence of the species in each site (allowing that perhaps more of one species at one site could produce more or less of another at the same site).

The random intercept and slope will be correlated as is currently written. If that assumption is inappropriate, then you can force the two values to be uncorrelated with the following syntax:

Tprev ~ species + (1 | site) + (0 + species | site)


Now, to your other point regarding the inclusion of an intercept or not, this is mostly irrelevant to the inferences you make from the model. The default behavior in R is for unordered factor variables to be dummy coded. So, since species has 3 levels, the reference level of the factor (by default this is whatever is the first level of the variable is) will be the intercept while two coefficients are estimated to correspond to the estimated difference in the remaining two levels and the intercept. This means that you can always obtain the prevalence estimate for a species by just summing the intercept and then that species’ coefficient. In a simple equation, you can think of it like this:

\hat{y} = \beta_0 + \beta_{s2}X_{s2} + \beta_{s3}X_{s3}

where X_{s2} = 1 when the observation being evaluated belongs to species 2 and is otherwise zero. The same is true for X_{s3} except for it indexing belonging to species 3. The dummy variable coding means that each level of the factor species can be recoded as a numeric value for one of these variables as follows:

Species Xs1 Xs2 Xs3
A 1 0 0
B 0 1 0
C 0 0 1

Thus, if you’re trying to predict the value associated with the third row of the above table (an observation belonging to whatever the third level of the species factor is), then the simple equation showed above becomes the following:

\hat{y} = \beta_0 + \beta_{s2}(0) + \beta_{s3}(1)

or, more simply,

\hat{y} = \beta_0 + \beta_{x3}

, which is just the intercept plus the coefficient estimated for the third level of the factor. The implication of the first level of the factor being the model’s intercept can be seen by plugging in the values of X_{s2} and X_{s3} associated with the first row of the table.

All of this is to say that the only difference between Tprev ~ species and Tprev ~ 0 + species is that the latter requires no addition of the coefficients. Alternatively, you can think of the coefficients as being transformations of each other:

Estimated Prevalence Tprev ~ species Tprev ~ 0 + species
Species 1 Intercept b_species1
Species 2 Intercept + b_species2 b_species2
Species 3 Intercept + b_species3 b_species2

Depending on what your focus is on, the specification may make your conclusions easier to get to. For example, if you were interested in knowing how much more prevalent one species is to the others, then keeping the intercept in the model is the best option since this lets you interpret the coefficients as the differences between the reference level and the other two levels of the factor.

It also may not be a direct swap between the two model notations because of how brms estimates the intercept. The intercept is not, by default, given its own b coefficient, meaning that using a prior like prior("normal(0, 1)", class = "b", nlpar = "Tprev") does not give the intercept a standard normal prior; instead, you’d need to use the specification prior("normal(0, 1)", class = "Intercept", nlpar = "Tprev") to change the intercept’s prior. I believe that by default, brms gives the intercept a student_t(3, 0, 2.5) prior, but I may be misremembering the details. You could specify Tprev ~ 0 + Intercept + Species to indicate that you want to keep the intercept with the other coefficients in your prior specification.

Another important thing to keep in mind is that brms mean centers your coefficients by default and specifies the prior on the mean-centered intercept. This can throw off priors quite a bit sometimes if you’re not thinking about the intercept prior as the model expectation when all other predictors are at their mean values. The intercept that is returned by brms, however, is transformed back to your standard intercept (i.e., the model expectation when all other predictors are at 0). This means that, while it doesn’t affect your interpretation of the final result, it can throw off your prior specification if you’re not careful.

1 Like

Thank you again for the theory and the practical tips! I will dig deeper into setting the random intercept and random effect as uncorrelated, because I do not know much about it.

The primary source of confusion with the random intercept was: if I set a random intercept that varies with site, and the first level of my factor variable becomes an intercept, do I allow the first-level species to vary by site, but not the others? The description of a random intercept model:

is straight forward to me in case of non-factor predictors, where the intercept is the base outcome, when all predictors are 0 (mean), but when the first level of a factor becomes the intercept, it is harder to grasp the meaning.

I think that might be true only for linear models, because get_prior() of my non-linear model specifies the class of the intercept as “b” with a flat prior, that’s why I did not specify the intercept separately.

I’ve encountered another problem. I am not sure how to interpret the output of brm in my model. At first I assumed that the brm summary will give me the Tprev that just needs to be transformed with inverse logistic transformation to get my true prevalence between 0 and 1 for the intercept (first species) and then sum it with respective estimates of other species to get their prevalence. But bothconditional_effects() and predict ( if I kick out species and do a simple random intercept model) give me different values of Tprev than inverse logit of the model estimate. My model syntax currently looks like this:

mod <- brm(
bf(Count | trials(total) ~ inv_logit(Tprev ) * Se + (1 - inv_logit(Tprev )) * (1 - Sp),
Tprev  ~ species + (1|site),
Se + Sp ~ 1,
nl = T),
family = binomial(link = 'identity'), data = data, chains = 1, prior = priors, control = list(

priors <- c(set_prior('normal(0,5)', class = 'b', nlpar = 'Tp'),
set_prior('normal(0,1)', class = 'sd', nlpar = 'Tp'),
set_prior('beta(10,1)', class = 'b', nlpar = 'Se', ub = 1, lb = 0),
set_prior('beta(10,1)', class = 'b',  nlpar = 'Sp', ub = 1, lb = 0))


Hope I didn’t exhaust your patience yet, @wgoette

Hopefully I can help out some with sorting through these lingering questions!

Just as with the fixed effects, you can compute the random intercepts for each species by just summing the intercept (the estimate for the reference level) and then coefficient for whatever species of interest. This means that you could also specify the random effects as ... (0 + Species | Site) to get the random intercepts of each species by site directly. It shouldn’t affect the results, just how you get to them.

When you’re estimating a non-linear model, the equation that is being estimated is the top line as this is the only portion with an observable outcome to fit to. So, the equation that you’re fitting and thus the one that will be used to calculate things like posterior predictive checks or conditional effects is Count | trials(total) ~ inv_logit( Tprev ) * Se + (1 - inv_logit( Tprev )) * (1 - Sp). Everything below this line is telling brms how to estimate the non-linear terms defined in this equation: Tprev, Se, and Sp.

In order to calculate Tprev, you’ll need to use the coefficients reported in summary() for this non-linear term. I believe that in order for you to get the posterior distribution of Tprev then you’ll need to extract the posterior draws for the coefficients from the fitted object and calculate the linear estimate from the formula: Tprev ~ species + (1 | site). That is then the value that you’ll apply the inverse logit to in order to get the true prevalence.

It seems like your primary interest is in calculating the prevalence of the species, but as currently specified, this is a parameter that you estimate en route to modeling the counts of a species. If you don’t care about the count itself but are rather using the count to obtain an estimate of prevalence, then the prevalence should be modeled as your outcome and your count data should be used to inform its estimation.

Additionally, it looks like your model has had some trouble returning valid estimates given that you’re running only a single chain and have adapt_delta up to 0.99, which is usually a sign that the model is misspecified in some way. As far as I know, there’s not really a reason to run a model on fewer than 4 chains as it’s important to verify that the chains mix well and for efficiency reasons. One issue I see is that I don’t think that your priors are working like you mean them to. To begin with, your code says that the normal(0, 5) and normal(0, 1) priors on set on the nlpar = 'Tp' but your model specifies the non-linear parameter as being named Tprev. Another issue is that you have a very wide prior on the prevalence non-linear term. On the logit scale, you’re saying that you’re saying that more than half (68%) of the overall probability is that the prevalence of any one species is between -5 and 5. If you convert this back to the probability scale (i.e., apply the inverse logit link function), then you’re saying that the prevalences are between 0.007 and 0.993. If you extend this prior to two standard deviations (i.e., ~98% of distribution), then you get estimates of prevalence from 0.00005 and 0.99995. This prior means that the model is encouraged to explore some extreme values, which is hard for posterior estimation especially when there is little data to encourage more realistic values. As previously mentioned, the beta priors on sensitivity and specificity also strongly favor high values without respect to the typical negative relationship between the two.

1 Like

Again, big thanks for sharing your knowledge. And thanks for pointing out the typos in code. I renamed some of the variables while writing the post to add clarity but added only more confusion! In my script all the names are correct.

I’ve never extracted draws from a non-linear model, but I followed this post. Let’s consider only one species for simplicity, so that Tp ~ 1 + (1|site).

I assume getting my Tp results would look like this:

Tp.fit <- fitted(mod, nlpar = 'Tprev')
inv_logit <- function(x) exp(x)/(1+exp(x))
Tp.fit.inv <- inv_logit(Tp.fit)
mean(Tp.fit.inv)


I tried also prepare_predictions(), with the same result, Tprev = 0.53 - way too high given my apparent prevalence is ~ 0.28

The estimate for Se was also very low, ~ 0.6 (much lower than the prior specifies). I compared the results with truePrev() from prevalence package. It uses basically the same model but without random intercept (and it’s in JAGS). It gives me Tprev = 0.37, using the same Se and Sp priors. That sounds more reasonable.
Moreover, when I omit the inv_logit function in my model, the returned Tprev estimate gives me 0.37 prevalence. The ‘Se’ estimate is also much closer to what I specified in priors. From that I conclude, that the inv_logit is messing up my results OR changing them in a way I just cannot understand. Unfortunately, without this transformation I get a lot of divergent transitions because of the random intercept. So i circle back to the very beginning. I’ve learned a lot, but still not enough to really comprehend what I am actually doing. I tried tweaking the priors of Se and Sp and tightening the priors on Tprev as you suggested but I does not change the outcome. The only model that works well and gives me straight-forward estimates is the no-logit-transformation-no-random-intercept model, and I am afraid I am stuck with it.

If I understand you correctly, you mean something like:

prev <- data %>% group_by(site, species) %>% summarise(Aprev = mean(Count), n = n())
mod <-  brm(
bf(Aprev | weights(n) ~ Tprev * Se + (1 - Tprev ) * (1 - Sp),
Tprev ~ species + (1|site),
Se + Sp ~ 1,
nl = T), family = gaussian(link = 'identity'), data = prev, chains =4)

Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Tprev_Intercept       0.64      0.15     0.34     0.93 1.00      937     1132
Tprev_speciesB        0.44      0.03     0.37     0.51 1.00     2578     2248
Tprev_speciesC        0.96      0.03     0.88     1.00 1.00     1854     1431
Se_Intercept          0.26      0.04     0.18     0.34 1.00     1338     1799
Sp_Intercept          0.40      0.04     0.31     0.48 1.00     1440     1862


I used various priors with the same effect. The Se and Sp (and the prevalences) don’t look right.

I’m following an advice from ‘Statistical Rethinking’ of running the model with one chain and if everything seems alright, rerunning with four chains. This also speeds up the process of debugging, because nothing is alright in the first try! Adapt delta is an artifact from the previous version of the model without logistic transformation. I rerun it with default controls and it works all the same.

I will keep banging my head against a wall. But I definitely learned a lot!

Very interesting to learn that brms::fitted(...) lets you obtain estimates for specific non-linear terms! I wasn’t aware of that feature, but I shouldn’t be surprised given the flexibility of the package.

It’s worth noting that the model fails once the random intercepts are added since that pinpoints it as the issue. There are two things that come to mind as possible reasons for this.

First, there’s a very general rule of thumb that you need at least 5 or 6 levels to obtain a reasonable estimate of random effects. I seem to recall that you have just 3 sites, but I might be confusing that with the number of species. So, it’s possible that the random effect is simply too unstable and introducing some unusual results in other parameters to compensate. If you inspect the output of the model, then you may notice some weird standard errors/credible bounds for some of the other non-linear terms if that’s the case.

Second, there may be an issue with the prior on standard deviation for Tprev. On the logit scale, which is what the prior is specified to, you’re giving a 0.34 probability of the random intercept at each site being somewhere between 0 and 1, which corresponds to sites varying in prevalence between 0 and 0.73 (i.e., inv_logit(1) using your function). Considering that you are thinking that the overall prevalence is somewhere on the scale of 0.40 or so, you’re letting the model explore possible site-specific adjustments to true prevalence that can easily far exceed that scale. Since this prior is on the standard deviation term, brms automatically truncates the distribution so that you do not have any prior probability of a negative standard deviation since that cannot exist. You are thus specifying a half-normal distribution at 0 with standard deviation of 1, describing how much on average the sites vary in their true prevalence. Say that the resulting standard deviation for the random intercepts is 0.50. This means that, on the logit scale, the site-specific intercepts from one another by an average of 0.50 standard units, and if we convert from this logit scale to the probability scale, then this is equivalent to saying that the average difference in site-specific contributions to true prevalence is about 0.62, which is a pretty large amount of variation relative to your expected true overall prevalence of 0.40-ish.

I think that it would be really helpful to view the prior predictive checks. You can always pull out some paper and pencil and work through possible outcomes implied by different priors, but it’s also sometimes just important to develop an intuition for what “tighter” or “looser” priors mean for a model. Using the argument brm(..., sample_prior = "only") will give you a model where the parameters are just samples from your priors. You can then use this model with fitted() and pp_check() just like you would any other model. Since these models are computationally cheap, I’d also encourage increasing the iterations to something more than 1000 just to get more samples to visualize. You can probably play around with a whole range of priors and various values for those distributions in a relatively short period of time, and by seeing how the “estimates” from the model are altered by these different choices can help to get a better idea of why wide priors aren’t the same thing as non-informative priors.

Ultimately, it is important to also recognize that difficulties in getting a model to work can often be an indication that the model is just wrong. In this case, it seems like things are fine until that random effect is added, which may be an indication that the random intercepts constitute a misspecification to the model (or at least for the model + data in the case where there are too few levels to get good estimates of random effects). There are a few things that you can do to confirm to yourself that the model issues from including the random intercepts is a signal of misfit rather than user-error in a couple of different ways.

Probably the most robust method would be through simulation based calibration: Simulation Based Calibration for rstan/cmdstanr models • SBC. Since the data are simulated according to a known generative process, this lets you check whether the model you’re specifying would work as you want it to in the case that it is the true model summarizing the data generation process. At the same time, you could examine what happens when you fit the model to a generative process where there are no random effects associated with the sites, and you’d (hopefully) observe the same kinds of estimation errors that you’re finding here.

Another thing to consider is whether the extra variance that the random effects are trying to account for is needed. This is commonly done with the intraclass correlation in multilevel models. I believe that there is a vague rule of thumb that an ICC < 0.07-ish supports that random effects are not useful, but I wouldn’t swear to that nor would I be overly confident in making a modeling decision purely on a rule of thumb. The performance package has an easy-to-use function to calculate the ICC from a fitted brms object.

Related to the previous point, you could also see how your model does with respect to predictive accuracy without the random effects. You can use something like fit <- add_criterion(fit, criterion = "bayes_R2") to get a sense for how the model does without random effects in terms of explaining variance. If this number is pretty large, then the addition of random effects may not be needed.

You can also do a formal model comparison with and without random effects to see whether the extra model complexity is worth it. This is pretty straightforward with brms:

fit_fixed <- add_criterion(fit_fixed, criterion = "waic") #can also use "loo" if needed
fit_random <- add_criterion(fit_random, criterion = "waic") #again, can also use "loo"
loo_compare(fit_fixed, fit_random, criterion = "waic") #or criterion = "loo"


There are lots of different kinds of things you can look at to help figure out whether a model fit should be trusted or not, of course. Feel free to take a look at the documentation for tidybayes for a range of posterior predictive checks, but also don’t overlook the value of a good pairs() plot or inspection of the loo results.

I’m not familiar with this package, but looking through its documentation, I take it that the following reflects your call:

out <- truePrev(x = Count,
n = total,
SE = ~dbeta(10, 1),
SP = ~dbeta(10, 1),
prior = c(1, 1), #corresponds to beta(1, 1) on true prevalence
...)


This model doesn’t seem entirely consistent with what you’re using here as it’s not taking into account multiple species at once. Was the motivation for the switch to brms in order to benefit from partial pooling by analyzing the species all in one model rather than separately?

2 Likes

Hi again!
I spent some time with simulated data and different complexity of the model, starting from simple linear bf(counts | trials(n) ~ species and trying to figure out where the problem lays. I have 3 species and 15 sites. The issues with convergence and divergence actually start as soon as I use link = identity which is probably because of my poorly specified priors and I will work on that. I’ve noticed that the model performs much better if the standard deviation of the random intercept is very small beta(2,50) or beta(2,80). Makes me doubt in the usefulness of the random intercept, but the mixed model fits much better than no-random effect model when comparing with ‘loo’ and having it from a scientific standpoint just makes sense.

What I find interesting is that, when specifying a simple model:

brm(counts | trials(n) ~ species,
family = binomial(link = 'identity'), data,  priors =
prior(beta(2,2), class = 'Intercept', ub=1, lb=0) +
prior(beta(2,2), class = 'b', ub = 1, lb = 0))


the estimates far off, the pp_check() suggests poor fit and there are some divergent transitions but when I use

brm(counts | trials(n) ~ species + 0,
family = binomial(link = 'identity'), data, priors =
prior(beta(2,2), class = 'b', ub = 1, lb = 0))


the model runs smoothly and the estimates are correct, pp_check() looks alright.

It looks like even though I specify the same prior and bounds on the Intercept as on other species, it acts as if I did not and the other priors are thrown off too.
This is the summary output when using sample_prior = 'only':

 Family: binomial
Formula: counts | trials(n) ~ species
Data: data.sim.ag (Number of observations: 45)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000

Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.17      0.25    -0.29     0.63 1.00     3821     2414
speciesB      0.49      0.22     0.09     0.90 1.00     4072     2534
speciesC      0.49      0.23     0.09     0.91 1.00     4622     3057


even though priors from the model fit are:

      prior     class     coef group resp dpar nlpar lb ub       source
beta(2, 2)         b                                 0  1         user
beta(2, 2)         b speciesB                        0  1 (vectorized)
beta(2, 2)         b speciesC                        0  1 (vectorized)
beta(2, 2) Intercept                                 0  1         user


In the species + 0 model when sampling only prior all estimates are 0.5, as expected.

I also checked what happens if I use the default binomial(link = 'logit'). Both models, with and without the intercept give the same priors across the three species.

And about the truePrev function:

Yes. My primary goal was to build a model that includes partial pooling between sites. Then, I would like the model to estimate prevalence for multiple species, possibly including partial pooling between species. I planned to expand it later on, modelling the relation of the true prevalence and environmental variables but getting the base model working is hard enough.

See the details of the set_prior section of the brms manual (page 209 of the current manual) - " Note that technically, this prior is set on an intercept that results when internally centering all population-level predictors around zero to improve sampling efficiency. On this centered intercept, specifying a prior is actually much easier and intuitive than on the original intercept, since the former represents the expected response value when all predictors are at their means. To treat the intercept as an ordinary population-level effect and avoid the centering parameterization, use 0 + Intercept on the right-hand side of the model formula."

2 Likes

As @jd_c points out, the intercept in brms is not what you generally think of. This is what I meant earlier:

Hopefully the additional details provided @jd_c gives this additional context as to why the prior specification is misleading in this case.

Can you clarify what you mean by “performs much better”? Is this from a comparison of the LOOICs, effective parameters (p_loo), comparison of effective sample sizes, or visual PPCs? A model can perform better than another one on a lot of different things, so it’s just helpful to know what is being used as the criterion. I see later on that you reference the loo comparisons, but there can be additional information from the loo package than just brms::compare_loo() function returns. See this excellent thread on how loo can be used to make decisions about a model: A quick note what I infer from p_loo and Pareto k values

1 Like

Hi,

If I got it right, there is no intercept in the model below thanks to +0, and the parameters for species are bounded on [0,1] thanks to the lb and ub in the prior() function. Therefore, there is no issue with the identity link.

brm(counts | trials(n) ~ species + 0,
family = binomial(link = 'identity'), data, priors =
prior(beta(2,2), class = 'b', ub = 1, lb = 0))


Meanwhile, in the following model, there is an intercept, and you may end up with Intercept + b_{species s} > 1 which might trigger the issues you are pinpointing in your post because of the identity link?

brm(counts | trials(n) ~ species,
family = binomial(link = 'identity'), data,  priors =
prior(beta(2,2), class = 'Intercept', ub=1, lb=0) +
prior(beta(2,2), class = 'b', ub = 1, lb = 0))


Likewise, in the model below, you may end up with a linear predictor > 1 (especially when including the RE ?) because you don’t set Se + Sp > 1 as a constraint. The need for such constraint has been noticed in this post for example Logistic Regression without a Gold Standard/Error in Observed Classification - #7 by wgoette.

mod <- brm(
bf(Count | trials(total) ~ inv_logit(Tprev ) * Se + (1 - inv_logit(Tprev )) * (1 - Sp),
Tprev  ~ species + (1|site),
Se + Sp ~ 1,
nl = T),
family = binomial(link = 'identity'), data = data, chains = 1, prior = priors, control = list(


If you don’t want to edit the underlying stan code produced by the brms call, a fallback solution could be to do something like:

bf(Count | trials(total) ~ inv_logit(Tprev ) * Se + (1 - inv_logit(Tprev )) * (1 - Sp),
Tprev  ~ species + (1|site),
Se ~ inv_logit(exp(gamma_se)),
Sp ~ inv_logit(exp(gamma_sp)),
gamma_se + gamma_sp ~ 1,
nl = T),


Because Se and Sp will be both bounded in ]0.5,1[ you will end up with Se + Sp > 1. You can set gamma_sp ~ N(.) and gamma_se ~ N(.), with parameters such that you get your desired ETI95% for Sp and Se. Note that Se > 0.5 and Sp > 0.5 are not huge assumptions, otherwise better off using coin toss.

I don’t know if this will solve your issue, but that might be worth a try.

Ps: Sorry for the multiple edits, I accidentally hit enter before everything was fine…

1 Like

Thank you all for your replies!
I am a bit embarrassed that it took three people to hammer down the point of the centred intercept and how intercept works in general, but at last I might have successfully attempted to specify priors for my model.
I finally understood (what @wgoette nicely showed in their posts), that my beta prior bounded between 0 and 1 on the species variable is impossible, because it was not relative to the intercept. My prior also did not allow for negative estimates, which were necessary in my case, because the intercept species has actually the highest prevalence. A quick fix was to set beta(2,2) on the intercept and normal(0,0.5) on the species. I also specified init = 0 because I’ve got many rejections of initial values in the chains. The prior check was finally how I expected it to be, the estimation on simulated and real data of the simple model counts | trials(n) ~ species looked sensible without errors in chains or divergent transitions. Thank you all for your patience, really!

All that being said, when I add the random intercept or if I try to run the non-linear model, I still get lots of divergent transitions and rejections of initial values (of the random variable and the probability parameter). The init=0 pops an error about log probability being negative infinity, and init = 0.1 still leads to rejections. I tried following Solomon Kurz’s post, to specify inits for each parameter, I’m confused though what names I should use for the parameters in the non-linear model.

I will look into the solution proposed by @Osupplisson and the linked post, thanks! I wanted to avoid transformations of parameters to make the prior specification easier, but I will test it out.

What about specyfying lb = 0.5 in priors for Se and Sp? I’ve read in the manual that setting boudaries on priors is problematic, so I assume the inv_logit(exp()) solution is better?

Good point, I measured the performance in the amount of divergent transitions and effective sample sizes I got. The smaller the ‘sd’, the less divergences I got. Because by using high alpha shape parameter values like beta(2,80)` I effectively remove almost all room for variance between sites, the use of the random intercept becomes questionable to me. Still, from a scientific standpoint it seems obvious to average over the sites. Also, I know there is ~0.15 variance between my simulated sites.

I haven’t kept up with this thread, but I would just caution about using priors like a tuning knob to remove divergent transitions. You really want to develop priors using your scientific knowledge of the field and experiment/study at hand and assessing them via prior predictive checks. It’s quite possible to use very narrow or restrictive priors that have no logical framework to reduce divergent transitions, but that doesn’t mean they are good priors for your application.

2 Likes