Use "hypothesis" function for cumulative (probit) model in brms

Hi everyone,

I have a (multivariate) cumulative model (with a probit link) with the following variables:

IV1 badge: categorical predictor with 3 levels: nobadge , twitter and irma.
IV2 context: categorical predictor with 2 levels: earth and cancer
Interaction badge:context with 6 levels accordingly
3 DVs with 5-point Likert scale data (sharing, sourcred, messcred)


d <- read.csv("example_data.csv")

m1 <- brm(
  mvbind(sharing, sourcred, messcred) ~ 1 + badge + context + badge:context,
  data = d,
  family = cumulative("probit"),
  prior = c(
    prior(normal(0,4), class = Intercept),
    prior(normal(0,4), class = b)

To test my 2 hypotheses I need to compare:

H1 twitter vs. nobadge (main effect)
H2 irma_cancer vs. nobadge_cancer (interaction effect)

To test H1 I would use sum contrasts for context and dummy contrasts for badge (nobadge as the reference level) so that the the estimate sharing_badgetwitter is the difference in sharing scores
between the levels twitter and nobadge while averaging over context.

For the second hypothesis I wanted to use the hypothesis() function from brms but I can’t figure out how to create the right contrasts so that I compare only the relevant part of the interaction. So from the six levels of the interaction badge:context I’m interested in the comparison of 'irma_cancervs.nobadge_cancer`.

Question: How can I create the desired contrasts for H2?

  • Operating System: macOS 11.5.2
  • brms Version: 2.13.5

Thank you for your suggestions!

Here is an example screenshot from the summary output & example data:

Screenshot 2021-08-18 at 15.36.02

example_data.csv (27.5 KB)

1 Like

A general way to answer such questions is to think about what the dummy coding (or any other coding you have) actually does. And we can even get R to do this for us!

So, let’s build the model matrix ourselves (starting from your example data):

distinct_d <- d %>% select(badge, context) %>% distinct()
#     badge context
# 1 nobadge   earth
# 2 twitter  cancer
# 3    irma   earth
# 4    irma  cancer
# 5 twitter   earth
# 6 nobadge  cancer

so we know we are interested in the difference between predictors represented in row 4 and row 6 of distinct_d.

Now we take advantage of the fact that brms calls model.matrix under the hood and see what the model matrix thinks (my results are for dummy coding and “earth” and “irma” as reference levels as I didn’t modify any contrasts/reference level properties).

m_matrix <- model.matrix( ~ 1 + badge + context + badge:context, data = distinct_d)
m_matrix[4, ]
#              (Intercept)              badgenobadge              badgetwitter              contextearth 
#                        1                         0                         0                         0 
#badgenobadge:contextearth badgetwitter:contextearth 
#                        0                         0 
m_matrix[6, ]
#              (Intercept)              badgenobadge              badgetwitter              contextearth 
#                        1                         1                         0                         0 
#badgenobadge:contextearth badgetwitter:contextearth 
#                        0                         0 
m_matrix[4, ] - m_matrix[6, ]
#              (Intercept)              badgenobadge              badgetwitter              contextearth 
#                        0                        -1                         0                         0 
#badgenobadge:contextearth badgetwitter:contextearth 
#                        0                         0 

so we see that with this coding we just need to look at the badgenobadge coefficient.

Does that make sense?

1 Like

Thank you @martinmodrak for this input, that’s already very helpful!

However, I’m still puzzled about some details:

  1. As I understand what you wrote, the badgenobadge coefficient represents the difference between the badge levels “irma” and “nobadge” at the context level “cancer” (so exactly what I want for hypothesis 2). I’m surprised because I thought this would be an interaction effect and thus I would look at the effects with the colon (:) … but that’s not the case I assume?

  2. Given that we leave the reference levels at “cancer” (you wrote the reference level is “earth” but wouldn’t it be “cancer” since c comes first alphabetically?) and “irma”, as it is without modification of the contrasts, how would I then come to my first hypothesis that wants to compare the badge levels “twitter” vs “nobadge” independently of “context” (main effect)?
    Would I have to look at the model matrix again and then somehow compare row 1 vs. row 5, but also row 2 vs. 6, and add those together to arrive at the main effect of badge “twitter” vs “nobadge”?

Thank you for your suggestions!

I think the best way to remove the confusion would be for you to review some intro to dummy coding of main effects and interactions (I don’t have a good link I can completely recommend, but Google is your friend). Possible complemented by playing with model.matrix for various inputs and trying to understand why it looks the way it does.

When an interaction is present in dummy coding, the coefficients “with colons” are created only for non-reference levels (that’s just how dummy coding works), so if you comparison involves the reference level, you may find yourself examining a coefficient “without colon”.

You are right, that was a mistake in my post. I’ll correct it, thanks.

if your model includes an interaction there is no single number that corresponds to this “independent” effect - each combination of badge and context is allowed to have different mean, so in some contexts, “twitter” can have larger expected value then “nobadge” and in other contexts it could be reversed. Or the magnitudes can be completely different. To make an overall measure of effect of just “twitter” vs. “nobadge” you need to combine those multiple numbers into a single number and there are many ways to do this.

A good way to think about this is in terms of prediction (which let’s you use posterior_predict, posterior_epred or posterior_linpred and avoid thinking about coefficients). You could for example make predictions for the dataset you have and look at the predicted differences between “badge” and “twitter” - this will mean that your overall effect is built from the context-specific effects by weighting each context proportionally to how frequently it appears in your data.

Or you could create an artificial dataset that contains a single copy of all context-badge combinations and predict for that. This will mean your overall difference is built from the context-specific effects by weighting each context equally. And there are many other options! Which is best completely depends on the ultimate question you are trying to answer.

1 Like

Thank you again for the detailed answer, I really appreciate it!!

This is exactly what I’m struggling with. Do you have any advice where I find more info about this?
In my case I preregistered a hypothesis that states that I don’t expect a difference between “nobadge” and “twitter” (using ROPE).
So while prediction is definitely something I should look into, I feel I need to come up with a single coefficient (and its credible intervals) to test my hypothesis as preregistered. I simply don’t know how to combine the multiple numbers (which all stand for different contrast settings) into one single number (that represents “nobadge” vs “twitter” across contexts).

As I said, there is no single way to do this and there is no single “best” answer. Which way of combining the coefficients is most sensible would depend on the exact scientific question of interest. It is possible that different sensible ways of combining the values will give roughly the same results and this would let you of the hook, but in full generality, different ways to combine could easily lead to completely opposite conclusions. It is also possible that your question is too vague to let you make the call, in which case it might make sense to spend some time clarifying the question.

One way out of this would be to test the hypothesis that the difference for all contexts is simultaneously within your ROPE - that is a harder bar to pass, but if you can pass it, then you are IMHO good.

I would also note that even inspecting a single coefficient can be framed as a prediction task (e.g. if your model was just y ~ 1 + badge than the badge coefficients correspond to your predictions about between-group differences in mean response on the latent probit scale in a future experiment where you have so many participants that noise becomes negligible). So thinking about predictions contains inspecting model coefficients as a special case. And it is possible that thinking about which predictions are relevant for you instead of confining yourself to the (somewhat contrived) predictions tasks inherent in your model coefficients. But no pressure :-)

1 Like

Thank you again @martinmodrak for your input.

I think I found now another way to get to my desired comparisons, namely via emmeans. To get comparisons for the different levels of badge within each context I would use:

emmeans(m1, specs = pairwise ~ badge | context, resp = "sharing")

This is what you meant with:


For the main effect of “twitter” there would also be the option:

emmeans(m1, specs = pairwise ~ badge, resp = "sharing")

However, I remember that testing main effects in the presence of interaction effects is often criticised (is this what you pointed out in the first paragraph of your last message?). So potentially checking within each context is safer than main effects.

Output with emmeans (within each context):
Screenshot 2021-09-13 at 17.28.18