# Interpreting beta estimates from ordinal brms

Dear community,

I am completely new to Bayesian stats and read through several tutorials and a few posts here on the forum but I cannot find a conclusive answer to what I need to do if I want to get standardized effect sizes from an ordinal brms regression. I followed the tutorial of Bürkner & Vuorre but also they did not describe how to get a standardized effect size…

model parameters:
Rating = ordered 7-point Likert scale
condition = visual or imagery
stimuls_type = art or face
a random effect for each participant = ID
a random effect for each stimulus = image_id

(I got this model by creating several beforehand and comparing them with the loo, this model was the winner)

bay_moving_cond_stim_id_id <- brm(Rating ~ Condition + stimulus_type + (1|ID)+ (1|image_id) , data=moving_long, family=cumulative("probit", threshold = "flexible"), chains = 5,
iter = 3000, warmup = 1000, cores = 10)


The result:

 Family: cumulative
Links: mu = probit; disc = identity
Formula: Rating ~ Condition + stimulus_type + (1 | ID) + (1 | image_id)
Data: moving_long (Number of observations: 2553)
Draws: 5 chains, each with iter = 3000; warmup = 1000; thin = 1;
total post-warmup draws = 10000

Group-Level Effects:
~ID (Number of levels: 34)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.60      0.08     0.47     0.79 1.00     1937     3233

~image_id (Number of levels: 40)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.31      0.04     0.23     0.40 1.00     3297     4063

Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]                      -2.26      0.14    -2.53    -1.99 1.00     1598     3043
Intercept[2]                      -1.33      0.13    -1.59    -1.06 1.00     1504     2881
Intercept[3]                      -0.60      0.13    -0.85    -0.34 1.00     1509     2642
Intercept[4]                      -0.14      0.13    -0.39     0.12 1.00     1500     2833
Intercept[5]                       0.75      0.13     0.49     1.01 1.00     1510     2677
Intercept[6]                       1.79      0.14     1.52     2.07 1.00     1640     3412
Conditionimagery_moving_rating    -0.29      0.04    -0.37    -0.20 1.00    15886     6802
stimulus_typeface                 -0.56      0.11    -0.77    -0.35 1.00     2844     4346


So I do get that a change from visual to moving (condition) results in a decrease of the ratings. I also do get that since the beta CI does not cross zero ‘it should be a real effect’ right?
I’d say intuitively from frequentist perspective (if it would be standardized) this is a small effect… Is this true for Bayesian as well?
Also do I need to create a Bayes Factor for this model or so to ‘convince’ a reviewer that this model shows that there is an effect in the data (compared to H0)?

I really appreciate any help!

Thanks for reading this far and have a great day!

Cheers,
Max

1 Like

The CI not crossing zero certainly indicates the effect is strongly evidenced as negative. You can estimate the precise posterior probability the effect is negative using brms’s hypothesis function:

hypothesis(bay_moving_cond_stim_id_id, "Conditionimagery_moving_rating < 0")


Looking at the CI, the posterior probability the effect is < 0 will be very strong for Conditionimagery_moving_rating and should be sufficiently convincing to a reasonable reviewer. In my opinion, Bayes factors are problematic for many reasons – they are hard to interpret and are heavily dependent on the priors used.

You can get an intuitive idea of the effect size by comparing it with the Intercept values. The intercepts are cutpoints on the underlying (latent) continuous standard-normal scale. Intercept[1] is the threshold between ratings of 1 and 2; Intercept[2] is the threshold between ratings 2 and 3; etc. So the width of rating 4 is -0.14 - -0.60 = 0.46; the width of rating 5 is 0.75 - -0.14 = 0.89. This implies that Conditionimagery_moving_rating will move about 2/3 ~ 0.29/0.46 of ratings of 4 down to 3, and about 1/3 ~ 0.29/0.89 of ratings of 5 down to 4. Whether or not that is a “small” effect depends one’s perspective.

In addition to Andy’s responses, bear in mind that you have used the probit link. As a consequence, the \beta coefficients in the model are on the latent standardized normal scale, just like they would be in a frequentist model. Thus in my discipline (clinical psychology), your coefficient for Conditionimagery_moving_rating would be considered small, and your coefficient for stimulus_typeface would be medium. YMMV

Thank you so much for your quick responses!

@andymilne thank you for explaining a bit more in detail how the intercepts interact with the beta coefficient!

@Solomon I am also in the field of psychology, so I share your interpretation, thanks!

My problem was that I did not understand that the underlying latent scale was standardized, which produces standardized beta coefficients (if I understood both of you correctly)

1 Like

Yes, with a probit link, the scale of the predicted values is standard normal, by definition. So, for predictors that are dummy-coded factors or continuous variables that standardized, the effect sizes are standardized.

1 Like

Andy makes a good clarification. The probit link puts the DV on latent z-score scale, but you still have to account for the metrics of the predictor variables. For example, if you have a single standardized predictor, its \beta coefficient will be in a correlation metric with the DV on latent z-score scale. But if the predictor is not standardized, it’s just a \beta coefficient on some other possibly unhelpful metric.

@Solomon
I am also in the field of psychology and reading through several manuals, I want to establish whether a found effect of my ordinal analysis is negligible or not ( Effect size guidelines for individual differences researchers - ScienceDirect ) using ROPE.

However, I have difficulties understanding whether I can use the pearson’s R cut-offs for the standardized beta coefficients which are the result of the hierarchical ordinal regression? Cause how I understand it r is a total correlation and standardized beta coefficents are partial correlations right?
Would I need to convert the measurement or not?


Family: cumulative
Links: mu = probit; disc = identity
Formula: Rating ~ Condition + stimulus_type + (1 | ID) + (1 | image_id)
Data: pleasure_long (Number of observations: 783)
Draws: 5 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 5000

Multilevel Hyperparameters:
~ID (Number of levels: 34)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.56      0.10     0.40     0.78 1.00     1558     2416

~image_id (Number of levels: 40)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.39      0.07     0.27     0.54 1.00     1574     2266

Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]                        -2.46      0.18    -2.80    -2.12 1.00     1535     2306
Intercept[2]                        -1.78      0.16    -2.10    -1.46 1.00     1442     2174
Intercept[3]                        -1.19      0.16    -1.51    -0.89 1.00     1367     2058
Intercept[4]                        -0.74      0.16    -1.04    -0.43 1.00     1369     2162
Intercept[5]                         0.20      0.15    -0.10     0.50 1.00     1392     2229
Intercept[6]                         1.05      0.16     0.75     1.35 1.00     1481     2584
Conditionimagery_pleasure_rating    -0.07      0.07    -0.22     0.07 1.00     5810     3712
stimulus_typeface                   -0.92      0.15    -1.21    -0.62 1.00     1857     2770


If Conditionimagery_moving_rating has been standardized, then to my mind the \beta coefficient is roughly analogous to a Pearson’s correlation coefficient. But as you rightly point out, it’s not exactly the same thing and I would be surprised if there was a well-established way to compare the two. In the absence of more precise guidance, you could argue for interpreting it using conventional correlation benchmarks, but it’d be a good idea to make it clear to your audience that they’re not exactly the same and you’re just doing your best.