Understanding parameters of beta family in brms

Hi there,

I’m quite new to brms and especially to handling the beta distribution, so I hope my issue is not too low-level:

I want to fit a Bayesian model with a response variable distributed between 0 and 1 (with zero inflation), so I decided for the zero inflated beta family in brms.

    fit <- brm(formula = y ~ 1,
                   data = data,
                   family = zero_inflated_beta(link = "logit", link_phi = "log", link_zi = "logit"))

But along the way, some unclarities have emerged:

  1. When learning more about the beta distribution, it turned out that there seem to be two different ways for its parameterization: (a) mu/mean and phi/precision parameter, or (b) a and b with a = mu*phi and b = (1−mu)*phi. I’m not entirely sure which parameterization is used in brms.

  2. I’m struggling with interpreting the posteriors because of the logit/log transformation in the beta model. Do I understand it correctly: The posterior estimates for “Intercept”/“phi” are in logit/log scale, hence need to be converted into probability scale for interpretation?

     estimate.Intercept <- posterior_summary(as.data.frame(fit)$b_Intercept)[1]
     a <- exp(estimate.Intercept) / (1 + exp(estimate.Intercept) )
    
     estimate.phi <- posterior_summary(as.data.frame(fit)$phi)[1] #phi
     b <- exp(estimate.phi)
    

And when transformed, those two are the parameters needed to define the shape of the beta distribution? So, basically dbeta(x, a, b) returns the curve of my predicitve posterior distribution?

Thanks a lot!

2 Likes

Hi,

in almost all cases, brms parametrizes the distributions so that the main predictor corresponds to the mean, so it uses the mu / phi parametrization.

Also, you can often gain insight into how stuff is implemented in brms by inspecting the generated Stan code via make_stancode

Yes, the parameter estimates from brms are before applying the inverse-link functions (you can inspect the default link functions at the help page for brmsfamily). However, to get accurate transformations, you generally need to work with individual samples and take summaries (e.g. mean) as the last step in the process. So a better way to compute estimate of a and b would be: for each sample, compute a value for a and b and then compute the mean of a and mean of b from those samples (this willl generally give somewhat different results than the method you’ve shown).

Also note, that you can use posterior_linpred, posterior_epred and prepare_predictions to simplify most of the common tasks.

Best of luck with your model!

4 Likes

Hi @Rigaldo, I found two perspectives helpful when interpreting beta parameters:

  1. The logit scale tells you if your effect is positive, negative or uncertain. This in itself can be interesting but often times we would like to know how big an effect is as well. Here we have the problem that you can’t interpret the parameters on their own. Due to the nonlinearity of the link function, the effect of your parameter is influenced by the rest of the model.
  2. This is where the conditional_effects() function can be very handy. It shows you the effect you are looking for for your average sample. While it again doesn’t show the complete picture, as the effect looks different for data points further away from the mean, it gives you a quick look at the outcome scale.

I have also written a small introduction for how to interpret such effects here (Page 11, Introduction of result section) maybe that is helpful as well.

If you want to see (so visuals, not numbers) your posterior, pp_check() is also a very useful tool. It shows you what your posterior looks like on the outcome scale.

5 Likes

Hi @Rigaldo
You are correct that the parameters are shown to you on the logit scale, which means that to understand them a bit more easily it is helpful to convert them back to the probability scale (e.g., with the function, plogis). The beta scale is parameterised as mean/mu and phi (phi is the spread of the data around the mean, a bit like a standard deviation for the normal distribution). To plot this type of beta distribution and understand it better, you can use the prop (dprop, rprop) etc. function from the package extradistr (PropBeta: Beta distribution of proportions in extraDistr: Additional Univariate and Multivariate Distributions). The usual beta function in R is not using the same parameterisation.

Below are some plots I previously made to understand the meaning of different mu and phi values that you may find useful (ignore the SD dots, they relate to a prior I had set on the parameter).

It looks like you also have a zero inflation think going on, so you would need to factor in whatever the effect of that is when you are trying to understand your results.

3 Likes

Thanks a lot to all of you @scholz , @martinmodrak , @JimBob ! All of your hints are very useful and make things a lot more understandable.

1 Like

Hi, just a short follow-up question: when assigning prior distributions for parameters (e.g. mu intercept), do I define those in the outcome scale or in the logit-transformed-scale?

edit: It strongly assume that this has to take place in log(it) scale too, but a short confirmation would be great.

1 Like

The priors for model coefficients (intercept, fixed/varying effects, …) are always on the scale where the model is linear, i.e. on the logit-transformed scale in this case. Priors for auxiliary parameters (e.g. phi in this model, sigma for normal-distributed models) is usually directly on the non-transformed parameters.

1 Like

Is this also the case if I want to predict phi?

The formula for my zero-inflated beta-model is:

bf(FL ~ 1 + YC,
   phi ~ 1 + YC,
   zi ~ 1 + access)

So, I have to give priors for the intercepts for phi and zi in non-transformed scale?

This was a thing I wanted left out to avoid confusion, but probably shouldn’t have: If you don’t predict the parameters, you give priors for them on non-transformed scale. When you predict them, the predictors become linear coefficients as any other and work on the transformed scale - the transformations are specified by the link_XX parameters of the families (and you can change them if you need).

So notably you would have:

# Prior on untransformed sigma
brm(y ~ x, data = dd, family = "gaussian", 
   prior = prior(normal(0,5), class = "sigma")) 

# Prior on log-sigma (as there are no other predictors for sigma)
brm(bf(y ~ x, sigma ~ 1), data = dd, family = "gaussian", 
   prior = prior(normal(0,5), class = "Intercept", dpar = "sigma"))

When in doubt you can always use make_stancode to see what’s happening. (I just did that to verify my understanding is correct)

2 Likes

Thanks!

Just to be sure: The two defined priors for sigma in your example are quite different, arent they?

1 Like

Yes, those would imply very different prior distributions of the data.

1 Like