Contrasts in BRMS


Hi all,

I have read for long time passively here but now I have 2 questions. I am confused about how to present group differences when I fit a model like e.g. this:

y ~ factorA * factorB + factorC*factorD + continuous_variable_s

When using lme4 or just the built in lm function in R, I needed to apply contr.sum to each factor because otherwise the comparison would be made to one reference group that might not be the comparison I need. But I always found this confusing and difficult to interpret, especially when group sizes were unequal.

Now, I have read in several texts that promote Bayesian data analyses that contrasting groups is much more straightforward here. But to me it is not clear how. I know how to extract the posterior sample and from that I saw that by default (just as with lme4 etc) contr.treatment is applied and there is a reference group. How do you usually do this kind of task to be able to compare the different groups?

Another question I have:
In linear regression, it makes sense to present the posterior distribution of \mu or the difference in \mu etc… But lets say we use the above model with the skew_normal distribution in the BRMS package. Then \mu by itself is not so meaningful (unless \alpha = 0, which means no skew). For such a model, would it make sense in a research article to use the full model to predict values for two disctinct groups and isntead of presenting single parameters such as \mu to predict mean or median of the difference in the predictions?

Thank you!


The general answer to all these questions is to compare the posterior predictions of the model at different levels of the factor. For example,

data_A1 <- dataset[dataset$factorA == levels(dataset$factorA)[1], ]
PPD_A1 <- posterior_predict(fit, newdata = data_A1)
data_A2 <- data_A1
data_A2$factorA[1:nrow(data_A2)]  <- levels(dataset$factorA)[2]
PPD_A2 <- posterior_predict(fit, newdata = data_A2)
PPD_diff <- PPD_A2 - PPD_A1

Now you have a matrix for the predicted changed in the outcome due to switching someone from the first level of factorA to the second level. You can do things like that for any pair of levels of any factor or even combinations of factors.


Hi @bgoodri,

thanks a lot for helping me out! I have follow up questions:

  1. So, if you had to present the group differences for such a model, you would present the difference in the predictive checks rather than presenting difference in any single parameters? That would be different that what is currently done in the frequentist papers, where the beta estimate is presented.

  2. I wonder: In the code above, is dataset the dataset that has been used to fit the model? And if so, does it have to be this way or should I make up a new dataset set completely by e.g. having equal amount of observation in each of the factor level groups. Because above in the code, we only compare the factorA groups in a subset of the data that had factorA == levels(factorA)[1]I might be that all participants in that subset have certain values of a covariate whereas if I used the other part of the data such as e.g. data_A1 <- dataset[dataset$factorA == levels(dataset$factorA)[2], ], then maybe the mean in posterior predictions would be quite different due to differences in covariates. Is that correct?


Yes. Focusing on a single parameter really only works in the case of a Gaussian likelihood and identity link function, in which case it is the same as the method I described, which is valid in other cases as well.

It does not have to be. But presumably the people in the original dataset who have the first level on factorA are representative of such people. So PPD_diff can be averaged over them.

What you see people do all the time, which I hate, is to set continuous covariates to the global mean when investigating how the predictions change when you go from level 1 of factorA to level 2, even though no one who has level 1 or level 2 of factorA is anywhere close to the global mean of some other variable.

1 Like

That sounds really great! Because then I can use this method even for complicated and as you said non-gaussian models!

What you say makes sense to me (that those data points are real and representative). But if I then take that subset that has data_A1 <- dataset[dataset$factorA == levels(dataset$factorA)[1], ] and then make them “artificially” to have data_A2$factorA[1:nrow(data_A2)] <- levels(dataset$factorA)[2], then this would be just as “unnatural” as setting a continuous covariate to the mean, would’t it be? E.g. lets say that factorA is sex. Then maybe other factors/covariates are only at certain values for females. Then by changing females to males to make that gender comparison, those males might have unnatural values for certain other variables.

I guess this is not avoidable completely because you just wanna find out what the model “thinks”?!


I would say that if changing from the first level of factorA to the second level implies a change in other variables, then you should change the other variables as well. But if you can’t imagine changing them levels “while holding (almost) everything else constant”, then the model and / or the question does not make sense.

1 Like


thank you! I will try this out tomorrow and get back.


Hi @bgoodri,

I tried this out and was scared by the uncertainty in the prediction. I mean, I used posterior predictive plots before to see if the model seems appropriate for the data but not to compare groups differences. In papers that use frequentist methods, they do definitively not report this prediction uncertainty so clearly.

Am I correct that they report what in this case would come out of fitted(fit, newdata) and that together with e.g. \sigma in case of a gaussian distribution (so I would also need to report \alpha, too). Should’t I also use fitted to report the estimated group differences in a paper? I understand that the prediction is what actually matters and really shows the model information but if I report only posterior predictions, then it seems like I doing something wrong to people who are not exposed to the idea of showing this uncertainty…

I wish I was not so much under time pressure. I am expected to have the result section written out asap…

Do you or someone else maybe know another paper that reports posterior prediction for the purpose of contrasting?


I don’t know of any paper specifically, but that you should do it is implied by the Bayesian approach to probability. There is not a lot you can do about other people being suspicious of the the results because you include the uncertainty, but it is the right thing to do. Using fitted is wrong, even though the function is defined for compatibility with glm. Comparing point predictions would be no better than comparing point estimates without reporting standard errors.

But if the data and code are public, then I think it is fine to only talk about the implications of the posterior distribution that are interesting to your argument.

1 Like

Of course you are right and I will definitively be transparent about everything. The code will be public but the data that depends on my supervisor. I will present posterior predictive intervals.

I want to understand something and make sure I have that right. If you know the answer would be great if you could confirm:

Remember my model above. factorA and factorB are balanced (both n = 50). However, factorC and factorD have unequal sample sizes and the latter affects only 12 of the 100 subjects. I did not change the contrast settings in R and therefore in my regression model, the intercept reflects the reference group (all factors at the first level).

I want to calculate the posterior distribution of the difference in \mu between some subgroups. For example, to calculate \mu for the subgroup of factorA == "no", factorB == "no", factorC == "no", factor_D==0 I simply use the parameter b_Intercept. Then factorA == "yes", factorB == "yes", factorC == "no", factor_D==0 would be b_Intercept + b_factorAyes + b_factorByes + b_factorAyes:b_factorByes etc.

The problem is the comparisons I can make here depend on what is the reference group. Now I wonder how can I calculate the posterior means of all the possible subgroups that I wanna compare.For example,
if I want to compare the group mean of factorA == "yes" to the group mean of factorA == "no" then I need to consider (b_Intercept + factorAyes + b_Intercept +factorAyes + factorByes + b_Intercept +factorAyes + factorByes + factorCyes + b_Intercept +factorAyes + factorByes + factorDyes + b_Intercept +factorAyes + factorByes + factorCyes + factorDyes + b_Intercept +factorAyes + factorCyes + factorDyes + b_Intercept +factorAyes + factorDyes) etc. / number of subgroups where factorA == yes.

The problem I see is that e.g. for factorCthere are only 12 participants. Does it make sense to calculate a the posterior distribution of \mu weighted such as:

((b_Intercept + factorAyes) * 25/50 + (b_Intercept +factorAyes + factorByes) * 12/50 etc.

Is that correct?


An easy way to get everything you need at once is to create a saturated data.frame like

nd <- with(original_dataset, 
           expand.grid(factorA = levels(factorA), factorB = levels(factorB),
                       factorC = levels(factorC), factorD = levels(factorD)))

But you then have to chose something for the continuous predictor. I like the median better than the mean.

nd$continuous <- median(model.frame(fit)$continuous)

Now you can posterior_predict stuff and each of the resulting columns pertains to the corresponding row in nd, which has some combination of the levels on the four factors. So, you can compare them.

And no, you do not weight the posterior distribution or the posterior predictive distribution by the proportion of people in each category in the data that you conditioned on. The posterior distribution already reflects the fact that groups with fewer observations tend to have less precise estimates of the parameter(s) that are specific to that group. You could weight the posterior predictive distribution by the proportion of people in each category in the population that you want to infer about, if you know those proportions from the Census or whatever.

1 Like

thanks again!
The problem with this method is that I do not get the posterior distribution of \mu for each subgroup but I get rnorm(mu, sigma, n = 4000) for each subgroup (replace rnorm with the skew_normal). While this is what matters in the end, I need to present the posterior distribution of \mu with 95% HDI to stick to conventions of my supervisor.

Is there an easy way to get that?

The reason I wanted to weight is:

The intercept is all factors at level 1. And then each coefficient is the mean difference compared to that reference group. However, I am interested in comparing the group means that I cannot directly get from adding the parameters. And here my thinking was:

Lets say that b_factorCyes is extremely high but there are only 10 people of 100 who are in this subgroup, then ((b_Intercept + b_factorCyes) + b_Intercept)/2would overestimate the mean of those two groups whereas ((b_Intercept + b_factorCyes) * 10/50 + b_Intercept) *40/50 would be more accurate.

The number here are not the actual ones but I guess the idea is clear.

EDIT: Is there a way so that the posterior_predict function ignores the random intercept?
EDIT2: I see it should be posterior_predict(fit_t, newdata = nd, allow_new_levels = T) if there are random intercepts/slopes…


You can replace posterior_predict with posterior_linpred(fit, transform = TRUE) to get the posterior distribution of the conditional mean function rather than the posterior predictive distribution, but that is pretty much always the wrong thing to be looking at unless the model is Bernoulli or binomial. Ignoring the group-specific intercept is also generally not a good idea, but you can do it by specifying re.form = ~ 0. I don’t like the HDI much either, but the hdi function in the HDInterval package will estimate it from a matrix under some assumptions. Reweighting the posterior distribution undoes the partial pooling that makes the posterior distribution optimal to begin with. If predicting the data that you conditioned on were that important, then no one would do partial pooling. You do partial pooling to better predict future data from the same data-generating process. I understand that your supervisor may be pushing you to go in these directions, but that doesn’t make them good.

1 Like

@bgoodri , that worked, thanks. I can promise you that I will be pushing towards the right way to present (soon I will be responsible for that on my own and then it will be easier).

Can I also extract other parameters than \mu using posterior_linpred? For example, if I would use a model where I also model \sigma as a linear function?


I think you need the dpar argument that is documented in help(posterior_linpred.brmsfit, package = "brms").

1 Like

Hi Ben running into a similar territory with this question.
Since the same level(s) of the factor is at different levels of other covariates, do we then average over the covariates?
For instance, if my data_A1 is a df of 24 observations with FactorA ocurring at several different covariates, this would give a 4000x24 matrix of PPD_A1 - is it then appropriate to average over the 24 and get a 4000x1 matrix of the ‘averaged over other levels’ FactorA?


I think so. rowMeans(PPD_A1 - PPD_A0) or something like that.