Ordinal model coeffcients sign in opposite direction than expected

I am a bit confused on how to best interpret my model coefficients as the estimates seem to be all in the opposite direction to what the data suggests, or what I see when I plot conditional_effects() in brms.

My experiment split users into two conditions, then all users saw profiles belonging to three groups (so 2bw x 3ws); they responded on a 10-point Likert-type scale (1-10).

Below is the code I ran:

#use all cores and make STAN run faster
options(mc.cores = parallel::detectCores())
#set seed for reproducible results
#set appropriate contrasts for factors with more than 2 levels.
options(contrasts = c('contr.orthonorm', 'contr.poly'))

#Make sure model understands which are factors
SEacc$Participant <-factor(SEacc$Participant)
SEacc$Original = factor(SEacc$Original)
SEacc$Condition = factor(SEacc$Condition)
SEacc$Trial = factor(SEacc$Trial)
#Likert ordinal data
SEacc$Confidence = factor(SEacc$Confidence, ordered = TRUE)

#priors set to extract BF per parameter [weakly informative][revised using Prior check]
my_conf_priors <- prior(normal(0, 5), class = "b") + prior(normal(0, 5), class = "Intercept")

m1.conf <- brm(Confidence ~ Condition*Original + (1| Participant), 
               data = SEacc,
               family =cumulative("probit"),
               warmup = 2000, iter = 40000, 
               save_pars = save_pars(all = TRUE),
               inits = 0,
               prior = my_conf_priors)


 print(m1.conf) #for model details
 Family: cumulative 
  Links: mu = probit; disc = identity 
Formula: Confidence ~ Condition * Original + (1 | Participant) 
   Data: SEacc (Number of observations: 3480) 
Samples: 4 chains, each with iter = 40000; warmup = 2000; thin = 1;
         total post-warmup samples = 152000

Group-Level Effects: 
~Participant (Number of levels: 116) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     1.27      0.09     1.10     1.46 1.00     9410    19817

Population-Level Effects: 
                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]            -3.93      0.18    -4.28    -3.59 1.00    10152    26559
Intercept[2]            -3.49      0.15    -3.78    -3.20 1.00     7336    18463
Intercept[3]            -2.85      0.13    -3.11    -2.60 1.00     5872    13803
Intercept[4]            -2.21      0.12    -2.46    -1.97 1.00     5468    12586
Intercept[5]            -1.55      0.12    -1.79    -1.31 1.00     5207    11713
Intercept[6]            -1.01      0.12    -1.25    -0.78 1.00     5046    11427
Intercept[7]            -0.29      0.12    -0.53    -0.06 1.00     4972    11139
Intercept[8]             0.56      0.12     0.32     0.80 1.00     5003    10871
Intercept[9]             1.36      0.12     1.11     1.60 1.00     5113    11144
Condition1              -0.18      0.17    -0.51     0.15 1.00     4419    10190
Original1               -0.29      0.03    -0.35    -0.22 1.00   127445   110978
Original2                0.04      0.03    -0.03     0.10 1.00   122714   110192
Condition1:Original1    -0.02      0.04    -0.11     0.06 1.00   130721   111146
Condition1:Original2    -0.00      0.04    -0.09     0.09 1.00   132923   111827

Family Specific Parameters: 
     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc     1.00      0.00     1.00     1.00 1.00   152000   152000

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).


Based on the raw data, and the marginal means, I interpret that Original1 should be positive (i.e., the SD in the Good profile is higher on the latent variable than in the Bad profile [reference category]) as the figure shows the probability of users selecting ratings of 8-10 is higher. Yet, based on 10.1177/2515245918823199 I understand I should read it as a decrease in overall confidence ratings.

What am I interpreting incorrectly here?
Thank you in advance.

1 Like

I think the problem is that with orthonormal coding, the model coefficients do not directly correspond to specific levels of the factor.

So running with a fake dataset gives me:

options(contrasts = c('contr.orthonorm', 'contr.poly'))
model.matrix(y ~ Original, data.frame(y = rnorm(3), Original = c("Bad", "Good", "Mixed")))
  (Intercept)  Original1  Original2
1           1  0.0000000  0.8164966
2           1 -0.7071068 -0.4082483
3           1  0.7071068 -0.4082483
[1] 0 1 1
[1] "contr.orthonorm"

so e.g. the difference between “Good” (second row) and “Bad” (first row) needs to be computed as
(-0.7071068 - 0) * b_Original1 + (-0.4082483 - 0.8164966) * b_Original2. Which reveals why a negative coefficient for Original1 can correspond to increase in the response (but we need to also take into account the coefficient for Original2).

I think using posterior_linpred to make predictions for the various cases and the computing the differences of interest is a more reliable way of making inferences for such more complex models. (feel free to ask for clarifications if it is unclear from this short description how to actually do that).

Does that make sense?

1 Like

Thank you for the response @martinmodrak I was thinking that may be the case. But does this also mean that pairwise contrasts using emmeans will be unreliable (i.e., not directly reflect the direction of change)?

Currently, to investigate the difference between the three Original levels and report inferential statistics, I use:

pw.conf.profile <- emmeans(m1.conf, specs = pairwise ~Original, point.est = median, level = .89)

(although, I always plot the marginal effects as above, to make sure I am making the correct inference; I just want to be sure going forward).

I know on most of the tutorials for brms people recommend we use the hypothesis() function, but I find that less intuitive for complex models, as most times I am unsure what parameters and scaling corrections I need to add; so emmeans seems more user-friendly. (but, for categorical models, not even that seems to be straight forward).

I honestly have no idea. The emmeans support brms only for a relatively short time and I don’t know which features are and which aren’t supported (or really anything about the internal working of emmeans.

Tagging @rvlenth as the author of the package who may know how the package would deal with a different contrast coding.


The results from emmeans() do not depend at all on the contrast coding. Linear models (and GLMs) yield the same predictions regardless of how they are parameterized, and emmeans() just produces predictions and averages thereof.

Obviously, different contrast codings result in different interpretations of the coefficients, and hence the relationship between them and emmeans() results.

All that said, of course an error in the computer code could cause these statements to be false – for example, if the contrasts attributes used in fitting the model are somehow not recovered. To make sure, you could try a small example with different contrast codings and verify that emmeans() works the same way (modulus simulation errors).

1 Like