# Scale of prior for non-linear beta regression

I have a pretty basic question regarding how to specify priors in a model with a logit link function. I stumbled upon this trying to fit the unimodal Sharpe-Schoolfield equation to food consumption rate of fishes as a function of temperature using brms:

b(T) = \frac{b_0(T_c)e^{E(\frac{1}{kT_c} - \frac{1}{kT})}} {1+e^{E_h(\frac{1}{kT_h} - \frac{1}{kT})}}

where b(T) is the rate at temperature, T, in Kelvin (K), b_0(T_c) is the rate at a common temperature, E (eV) describes the thermal sensitivity of the biological rate, k is Boltzmann’s constant (8.62 × 10−5 eV K−1), Eh (eV) characterises the decline in the rate past the optimum temperature and Th (K) is the temperature at which half the rate is reduced due to high temperatures. This is the parameterization in Padfield et al., (2020).

The data I have compiled consist of many species, and their feeding rate is expressed in as fraction of max. Hence I want to model this with species-varying parameter b_0. I further rescaled the feeding rates to not have 1’s in order to fit a Beta model (can also consider using a zero_one_inflated_beta once I figure out my little issue). Temperature has also been centred so that 0 is the peak temperature within species. The data can be found here:

d <- readr::read_csv("https://raw.githubusercontent.com/maxlindmark/stan_data/main/dat.csv")

Given how the data have been centred, I think these priors make sense (I’m using uniform priors here to make sure they don’t wander off… sometimes I get biomodal posteriors in non-linear models unless at least one parameter has a uniform prior):

prior <-
prior("uniform(0, 1)", nlpar = "b0", lb = 0, ub = 1) + # consumption at the reference temperature, 0 (note all y are bounded in [0,1])
prior("uniform(0, 3)", nlpar = "E", lb = 0, ub = 3) + # activation energy
prior("uniform(0, 4)", nlpar = "Eh", lb = 0, ub = 4) + # de-activation energy
prior("uniform(0, 10)", nlpar = "Th", lb = 0, ub = 10) # temperature at half rate after decline - hence it should be above 0 which is the peak temperature in the data


With that, we can fit the model using an identity link first:

library(readr)
library(brms)
library(dplyr)
library(ggplot2)
library(modelr)
library(tidybayes)

# Fit a Beta model with an identity link
beta_identity <-
brm(bf(con_percent_scaled ~ (b0 * exp(E*((1/(k*(T_c + 273.15))) - 1/(k*(temp_ct + 273.15))))) / (1 + exp(Eh*((1/(k*(Th + 273.15))) - (1/(k*(temp_ct + 273.15)))))),
b0 ~ 1 + (1|species_ab), E ~ 1, Eh ~ 1, Th ~ 1, nl = TRUE),
data = d, family = Beta(link = "identity"), prior = prior, iter = 4000, cores = 3, chains = 3)

d %>%
data_grid(temp_ct = seq_range(temp_ct, n = 50)) %>%
mutate(k = 8.62*10^-5,
T_c = 0) %>%
add_predicted_draws(beta_identity, re_formula = NA, scale = "response") %>%
ggplot(aes(x = temp_ct, y = con_percent_scaled)) +
stat_lineribbon(aes(y = .prediction), .width = c(.95), alpha = 0.4, fill = "grey") +
geom_point(data = d, aes(temp_ct, con_percent_scaled, color = species_ab), alpha = 1)


This looks OK. Next I wanted to try using a logit link function, to make the mean of the Beta-distribution bounded [0,1]:

beta_logit <-
brm(bf(con_percent_scaled ~ (b0 * exp(E*((1/(k*(T_c + 273.15))) - 1/(k*(temp_ct + 273.15))))) / (1 + exp(Eh*((1/(k*(Th + 273.15))) - (1/(k*(temp_ct + 273.15)))))),
b0 ~ 1 + (1|species_ab), E ~ 1, Eh ~ 1, Th ~ 1, nl = TRUE),
data = d, family = Beta(link = "logit", link_phi = "log"), prior = prior, iter = 4000, cores = 3, chains = 3)

d %>%
data_grid(temp_ct = seq_range(temp_ct, n = 50)) %>%
mutate(k = 8.62*10^-5,
T_c = 0) %>%
add_predicted_draws(beta_logit, re_formula = NA, scale = "response") %>%
ggplot(aes(x = temp_ct, y = con_percent_scaled)) +
stat_lineribbon(aes(y = .prediction), .width = c(.95), alpha = 0.4, fill = "grey") +
geom_point(data = d, aes(temp_ct, con_percent_scaled, color = species_ab), alpha = 1)


Ok, here’s where I get stuck: should the priors in the model with an logit link function be on the logit scale? Because when I extract the stancode, mu[n] is the inv_logit of the predictors: ... mu[n] = inv_logit((nlp_b0[n] .... And if that’s the case, my second problem is I don’t really have a feeling for how to “convert” the priors that make sense on the scale of the reponse to the logit scale.

• Operating System: MacOS Catalina
• brms Version: brms_2.13.5

edited: response is a fraction, not %

Anyone with some advice for me on how to deal with priors when changing from a Gaussian to a Beta model? :)

At least in brms, which goes through Stan, the priors you specify for your mean beta values are not specified a between 0 and 1, but as you suggest on the logit scale from - to + infinity. You likely want to constrain the values with a prior because even quite low numbers on the logit scale already correspond to quite extreme values on the beta scale e.g., 4.6 is about 0.99. This graph may be of some use in thinking about what values you expect on the beta scale, and the corresponding values on a logit scale. The mean and SDs on the graph correspond to a prior on the logit scale with a mean of 0, and an SD of 1.25.

Thank you for the graph! It’s very helpful.

Right, so if I would have a model of the mean (no covariates) that could take any value between 0 and 1 (call it \mu), I could write:

prior("uniform(-10, 10)", nlpar = "mu", lb = -10, ub = 10), saying all values for the mean of the Beta model are equally likely (instead of narrowing it down between 0.5 and 0.7310586 as my Gaussian model would do, i.e. the "prior(uniform(-0, 1))", ... code I had above).

But how do I find good priors when \mu is modelled as a combination of non linear parameters and variables (for example, the Sharpe-Schoolfield model)? I.e., it’s not clear to me how to set priors on specific parameters on the Sharpe-Schoolfield parameters, when the whole equation is within the inv_logit function. Sorry if I’m missing something obvious here…

Hi Max,
I certainly wouldn’t model it with the basic prior on mu as a uniform distribution from -10 to 10. I’m not sure if that is part of what you would want to do or if it is just an example, but you need to think about the conversion from that logit scale to the beta scale you care about. If you plot that uniform prior in its transformed form between 0 and 1, you will see that it places huge amount of weight well into the extremes of the distribution at either end. I presume you don’t think that most of the probability on the beta scale should be less than .01 or greater than .99? Hence, I would suggest a normal distribution or something similar, centered around 0, with not too high a standard deviation - or again, the weight will be placed in the extremes again. You can see what the implications of the priors are on this single parameter by actually simulating with runif and asking for several thousand observations, then convert them with plogis or inv_logit, and plot the numbers as a density plot. It will be really peaked at the extremes, which I imagine is not what you want? Then compare it with rnorm, mean0 and sd 1.25, transformed with plogis or inv_logit - the probability mass tapers off around the extremes.

Things indeed get more complicated and difficult to imagine when you are combining multiple parameters. You can specify that you want to really see the priors in brms by putting this into the model specification:

sample_prior = ‘only’

This will run a prior predictive simulation, and when you see or plot the results, it is showing combinations of the different priors you have put in, without updating them with any data - i.e., your results are just the priors.

The graph above should however help with thinking about what plausible values might be, and you’d have to use some of your domain knowledge for what is plausible. Let’s say you think your average or intercept value should be about .5 probability (0 on the logit scale). Then think, what size effect could one of your variables have - you can sort of eyeball effects on the different scales by looking at the graph. Perhaps the variable in question is likely to influence the average by + or - .15 on the beta scale. That effect would be easily captured (and then some) by a prior on that parameter centered around 0, with an sd of .75. These are just very rough examples, but that’s how I think of it. I can’t comment regarding the exact model you have because I don’t know anything about it, and am also not quite that good at Bayesian analysis to give you more than pointers I’m afraid!

1 Like

Yes, I only meant that as an example. I.e., if my model had been an intercept-only model, even I would have (probably) been able to convert between the scales! (sorry if that was more confusing than helpful…).

But now I have non-linear model with 4 parameters. I have a good feeling for what these parameters should be on the same scale as the response, as in the identity link example above, and here’s a figure showing the figure the code above produces with the identity-link model:

and here for the posteriors:

But for the logit link… I do like your approach to it and I think it makes sense. Especially if I had a linear model with standardized predictors variables! But I’m not sure how to apply it to this case. For instance, the parameter E describes how fast the response variable increases (exponentially) with X at X<0, where X=0 is where the function reaches its maximum, and Eh describes the rate of the fall (also exponential) for X>0. X here is temp_ct. So I struggle with how to think about it in terms of how much the average (intercept) changes with 1 unit or sd in predictor x1 as if it had been a more straightforward linear model… Thanks though! It’s much appreciated!

That is certainly a bit tricky to wrap your head around. Perhaps you might need to test out some different parameter values for these variables: plug them into your regression equation, and calculate predicted outcome scores (just using addition/multiplication etc to work out the regression outcomes, not using brms). That way you can get a better sense of what different values imply about the outcome. It is certainly difficult to think about just in your head. This could at least help you note points at which outcomes start to become totally weird and implausible.

1 Like

That sounds like a good idea! But before I dig into that, maybe I can add one more question to the thread. I read in Douma & Weedon, 2019 that it’s recommended to test alternative link functions, including even an “identity” link, “albeit with care”.

In my case it seems fine to me to use an identity link (figure in my last post), and the model for sure becomes easier to interpret and fit… I guess this is a question more fore the authors, but would this be an example of where this is done with care…? Or what are the pitfalls I’m missing here?

I would have to read the paper and properly understand your model to answer, I think. I’m not sure if the question of modeling with different link functions is the same as saying you could just treat the data as normally distributed or something, and just model it directly on the ‘probability scale’? That would be possible to do, I think it just depends on how well the model then fits the data. I work with psychological responses and if I do something like that, I typically end up with regression models that predict data points well under or over the 0-1 bounds, and the beta distribution can recreate the look of the data much better.

Yes, I understand!

I’m also not sure about that. Here’s the same plot as two posts above, but with a Gaussian model and an identity link. Here the prediction interval (not the prediction) goes below 0 (and just slightly above 1),

but with the Beta distribution and identity link the prediction interval is bound between 0 and 1 (quoted figure).

So, in my view, it seems Beta(link = "identity") does what I hoped the Beta(link = "logit")would do (not predicting outside possible data) if I had gotten that to work.

Nice, I somehow haven’t managed to make the beta thing run with “identity” as the link. Maybe I should try again.

I think the interval spreading that far over 0 in the gaussian model could be considered problematic, as with a Bayesian model we really want to pay attention to the uncertainty in the estimate, and there is no single ‘prediction’, but a range of plausible predictions. Of course, it all depends on your degree of tolerance for these things and possibly other aspects of the model that you might find more appropriate or useful. But nice that it looks like you can try to just set priors on the identity scale now, and get the benefits of the beta model without the confusion.

Thanks for your input JimBob, it’s much appreciated! I can update this thread once I land on something final, but for now I might stick to the Beta(link = "identity") approach.

Cool. It would be interesting to see, though perhaps not possible, what your regression ends up looking like and how you interpret it. As I use beta models a lot at the moment, I’m interested to see what other people do with it. Good luck!