Choosing family and link

I’m complete new to bayesian analysis and brms and am now taking my first steps to estimate a multivariate multiple membership model using brms in R.

Most of my independent and dependent variables are averaged scores from multiple items that have a likert-type rating scale (scores 0 to 4). Two variables are standardised achievement scores and can thus also take on negative values. Most variables are moderately skewed and some also have kurtosis. My guess is that I have to use the skew_normal family. Is that right? Does that only take into account the skewness or also the kurtosis?

My next question would then be what link to choose. I found this specification under Usage on

skew_normal(link = “identity”, link_sigma = “log”, link_alpha = “identity”)

Are these the default options or these the only options for the skew_normal family?



Hi Pia,

a very good first read is this paper by @paul.buerkner and Emmanuel.

1 Like

Hi torkar,

Many thanks for pointing out this paper. It seems interesting but I think it does not apply to my case because I do not have discrete values for my predictors, which seems to be a requirement for the described monotonic effects. Since my variables are constituted by average scores, I have values such as 1.725, 2.5, 3, 4, 0.25 etc.


Ah, sorry. But why not use the data as-is? :)

I would have loved to use latent variable modelling and then it wouldn’t have been necessary to combine items. But since that is not currently possible in brms, as far as I know, I had to combine items that measure the same construct.

Can anyone else advice me on whether skew_normal is the correct family for my data and what links I should choose?

Have you plotted your outcome vars?

Yes, except for one, they are all significantly skewed and many have significant kurtosis as well.

Wait, how many outcome variables do you have? How many predictors? I would try a lognormal but also a normal and see what the out of sample prediction looks like (if you can assume a normal distribution). Regarding your question what link to choose I saw this in the source code:

.family_skew_normal <- function() {
    links = c("identity", "log", "inverse", "softplus"),```

Way too many probably! I have 8 outcome variables, all being predicted by the same variable a year earlier (to control for initial levels) and another 6 variables. 5 on the student level and one on the classroom level. Thanks for the advice! What do you mean by “out of sample prediction”? Sorry, just really new to all of this…

If I were you I’d start to build a very simple model with one outcome and one predictor, and then add predictors to see what happens.

You could have a look at projpred to see how many predictors you should add (some predictors might not have any predictive power so not necessary to make the model too complex if it’s not needed). Once you’ve done this you might have learned more about the data to make changes to the design of your model.

Build small, step by step, and iterate. Please do come back if you have questions along the way.

1 Like

I will do that! Thank you :)

Hi, I have a few follow-up questions.

First, I ran my models using the skew_normal family with identity link function. When I now plot my observed outcome scores (y) and the predicted outcome scores (yrep), there is quite a nice fit for most of my outcomes. Would this mean, that the family and link I chose are appropriate?

And again, is it OK to use the skew_normal even if the variables have a scale between 0 and 4 or can you only use this if you also have negative values? I wanted to use the log_normal but that’s not possible because I have values that are 0.

First of, I think a good description should consist of more than of some summary statistics like range, skewness, etc… It is at least as important to describe how the data are generated. DO the rate result from a rating scale, measurement of physical attributes, response times, and so forth.

I assume your last sentence means that your outcome only can have values between 0 and 4. (i.e. 0, 1, 2, 3, 4; e.g. from a rating scale.)
In this case, one should rather use binomial, beta-binomial, or ordinal model, because these “families” take the integer nature of the data into account.

If your data is between 0 and 4, but any real number in this interval is possible, it could also make sense to use a truncated normal distribution.

But is is impossible to say what the appropriate family/model is, without knowing how the data were generated.

Hi Guido, thanks for your quick reply. I have five outcome variables that represent student ratings on a Likert-type scale. These variables are averages though, so that any value, not just integers, between 0 and 4 are possible. The fifth outcome variable is a standardised achievement score, so that one actually fits well to the normal distribution.

For the truncated normal distribution, do I then choose family = gaussian and define a lower and upper bound or do I have to do this in a different way?

about truncation: see 3rd paragraph on page 32 here:

However, the way you describe the data, a hierarchical ordinal model for the raw data (i.e. no averages) would be the better model. You can specify item specific intercepts in the hierarchical model, and model individual level random effects to take into account multiple measurements per person.

Here is a tutorial on ordinal models in brms from @paul.buerkner:

Thank you, I’ll try the truncation. The variables represent averages of multiple items that measure the same concept. Ideally, I would have used latent variable modelling for this but since that is not yet possible in brms (and I do have to use it for my model), forming composite scores seemed like the best option.

I think estimating an ordinal model in brms is basically latent variable modeling. Your predictors multiplied by the regression weights make up the latent ability for the individuals and by specifying the intercept appropriately, you define if cut-points are shared (only one intercept) or not (item specific intercepts) across items.

Sorry for my late reply. You mean that I would give all predictors that are supposed to measure the same concept (i.e. all manifest variables making up one latent variable) could be given the same regression coefficient or only the same intercept? I definitely need to read more about this.

Going back to the truncated distribution for now, I have estimated a simple intercept only model, predicting one variable ones with the truncation option and ones without. The pp-checks are very similar and the same default priors were applied. However, the results are quite different.

For the non-truncated model:

And for the truncated model (range restricted to between 0 and 4).

I also wonder why the effect sample size is so different. Plus, the truncated model shows a positive skew, which is very strange. Is this something that can be expected or is something weird going on?


About ordinal regression:
Yes, Items that measure the same latent variable should have the same regression weights. Regarding the intercepts, it depends. If all items (for one latent variable) have the same difficulty, it makes sense use the same intercept for all. if there is good reason to assume that all items do not have the same difficulty, item specific intercepts make sense.

(There is also edstan, which ist for irt analysis and also allows doing regressions, regressions)

I am not sure about the difference between the truncated and non truncated models. Intercepts should certainly be different. Regression weights should be similar, but don’t need to be close (I know this is not a very precise expression). Effective sample also can diverge.