Ordinal regression model in brms

Dear community,

This is a partial crosspost from Crossvalidated, but I moved from ordinal package to brms. I’m not a statistician, nor a full-time-researcher, just a resident physician working on his doctor’s thesis and I’m trying to analyse data from a survey for my doctorate and I’m having troubles finding the right literature and methods for my problem.

I’m trying to specify a model for the following data:

participant_id		character	unique number
answer			ordered factor	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
attractivity		ordered factor	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
competence		ordered factor	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
honesty			ordered factor	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
intelligence		ordered factor	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
kindness		ordered factor	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
reliability		ordered factor	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
weight_attractivity	ordered factor	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
weight_competence	ordered factor	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
weight_honesty		ordered factor	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
weight_intelligence	ordered factor	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
weight_kindness		ordered factor	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
weight_reliability	ordered factor	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
situation		factor		Internist, Surgeon
face			factor		Man_Average, Woman_Average, Man_Attractive, Woman_Attractive
disfigurement		factor		None, Strabismus, Acne, Piercing, Tattoo

I have 4009 unique participants with four answer blocks each, so 16036 rows in wide format in total.

Participants of the survey have been shown four pictures of people (average man, average woman, attractive man, attractive woman) with four disfigurements (strabism, acne, piercing, tattoo), crossed in a latin square OR the control group (every face without disfigurement), randomized equally (20% each group). Participants were told they are gonna have a surgical treatment or a medical checkup (randomized) by the physician shown and they had to answer on a Likert-Scale from 0-10 (0 very improbable, 10 very probable) how much they would like to get the medical treatment from this physician. They also had to rate the physicians regarding to attractivity, competence, honesty, intelligence, kindness and reliability again on a Likert-Scale from 0-10 and also how important these attributes for the participants are.

I specified the model as follows:

Model <- brm(answer ~ disfigurement + situation + situation:disfigurement + 
                      (1 + disfigurement | face) + (1 | participant_id),
             family = cumulative(link = "logit", threshold = "flexible"),
             prior = c(set_prior(prior = "normal(0,10)", class = "Intercept"),
                       set_prior(prior = "normal(0,10)", class = "b"),
                       set_prior(prior = "cauchy(0,5)", class = "sd")),
             control = list(max_treedepth = 15, adapt_delta = 0.99))

This is the summary() output:

 Family: cumulative 
  Links: mu = logit; disc = identity 
Formula: answer ~ disfigurement + situation + situation:disfigurement + (1 + disfigurement | face) + (1 | participant_id) 
   Data: strabism_answers_situation (Number of observations: 16036) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Group-Level Effects: 
~face (Number of levels: 4) 
                                                 Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)                                        1.51      0.98     0.57     4.19       1275 1.00
sd(disfigurementStrabism)                            1.04      0.71     0.38     2.94       1589 1.00
sd(disfigurementAcne)                                1.35      0.97     0.47     3.87       1519 1.00
sd(disfigurementPiercing)                            0.18      0.33     0.00     0.87        502 1.00
sd(disfigurementTattoo)                              0.54      0.51     0.12     1.92       1267 1.00
cor(Intercept,disfigurementStrabism)                -0.28      0.37    -0.87     0.51       3038 1.00
cor(Intercept,disfigurementAcne)                    -0.26      0.37    -0.84     0.51       2853 1.00
cor(disfigurementStrabism,disfigurementAcne)         0.17      0.38    -0.60     0.81       2575 1.00
cor(Intercept,disfigurementPiercing)                -0.08      0.41    -0.80     0.70       4138 1.00
cor(disfigurementStrabism,disfigurementPiercing)     0.01      0.41    -0.74     0.76       4332 1.00
cor(disfigurementAcne,disfigurementPiercing)         0.11      0.41    -0.69     0.82       3206 1.00
cor(Intercept,disfigurementTattoo)                  -0.01      0.38    -0.72     0.69       3712 1.00
cor(disfigurementStrabism,disfigurementTattoo)       0.21      0.39    -0.57     0.82       3144 1.00
cor(disfigurementAcne,disfigurementTattoo)          -0.02      0.38    -0.73     0.70       2712 1.00
cor(disfigurementPiercing,disfigurementTattoo)      -0.06      0.41    -0.78     0.72       2313 1.00

~participant_id (Number of levels: 4009) 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)     1.39      0.03     1.34     1.44       1378 1.00

Population-Level Effects: 
                                         Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept[1]                                -3.51      0.81    -5.20    -1.77        819 1.00
Intercept[2]                                -2.71      0.81    -4.39    -0.96        818 1.00
Intercept[3]                                -1.95      0.81    -3.62    -0.19        818 1.00
Intercept[4]                                -1.22      0.81    -2.87     0.54        822 1.00
Intercept[5]                                -0.60      0.81    -2.26     1.16        822 1.00
Intercept[6]                                 0.12      0.81    -1.56     1.87        827 1.00
Intercept[7]                                 0.77      0.81    -0.91     2.53        827 1.00
Intercept[8]                                 1.53      0.81    -0.13     3.29        828 1.00
Intercept[9]                                 2.57      0.81     0.90     4.33        834 1.00
Intercept[10]                                3.42      0.81     1.76     5.19        843 1.00
disfigurementStrabism                       -1.74      0.62    -3.04    -0.57       1182 1.00
disfigurementAcne                           -0.86      0.77    -2.37     0.76       1419 1.00
disfigurementPiercing                       -0.16      0.21    -0.51     0.14        354 1.01
disfigurementTattoo                         -0.69      0.35    -1.42    -0.01        899 1.00
situationInternist                           0.46      0.12     0.24     0.69        686 1.00
disfigurementStrabism:situationInternist     0.75      0.14     0.47     1.03        741 1.00
disfigurementAcne:situationInternist        -0.71      0.14    -0.99    -0.43        719 1.00
disfigurementPiercing:situationInternist    -0.01      0.14    -0.29     0.27        698 1.00
disfigurementTattoo:situationInternist      -0.02      0.14    -0.29     0.26        720 1.00

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

LOO gives the following output:

Computed from 4000 by 16036 log-likelihood matrix

Estimate    SE
elpd_loo -34808.9  78.9
p_loo      2951.7  23.1
looic     69617.9 157.8
------
Monte Carlo SE of elpd_loo is 1.1.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     16020 99.9%   942       
 (0.5, 0.7]   (ok)          16  0.1%   2018      
   (0.7, 1]   (bad)          0  0.0%   <NA>      
   (1, Inf)   (very bad)     0  0.0%   <NA>      

All Pareto k estimates are ok (k < 0.7).

check_hmc_diagnostics() on the stanfit object yields the following result:

Divergences:
0 of 4000 iterations ended with a divergence.

Tree depth:
0 of 4000 iterations saturated the maximum tree depth of 15.

Energy:
E-BFMI indicated no pathological behavior.

The wall-time on the HPC cluster I have access to is about 9h for this model with four cores and 16Gb RAM.


Now my questions:

  1. Do I need to do additional diagnostics? If yes: What do I have to do?
  2. How can I check the goodness of fit and prevent overfitting?
  3. Do you have ideas for other model specifications?
  4. My primary hypothesis is, that strabismus leads to a lower score than no disfigurement. How can I check this?
  5. Does the interaction between disfigurement and situation presents difficulties in answering question three?
  6. How can I plot and communicate the face effect?

Hi, did you already take a look at this ordinal tutorial paper for brms: https://psyarxiv.com/x8swp/
Perhaps that could be helpful.

Some other quick thoughts:

  1. Convergence looks good I think
  2. Model comparison for instance via loo and loo_compare
  3. face has just 4 levels. Modeling so many varying effects for it is probably not going to work out reasonably well.
  4. Take a look at the hypothesis method
  5. Depends on the coding of your categorical variables.

Hi Paul,
Thank you for your answer!

I read it already many times, that’s how I managed to specify the model until now, but now I’m stuck at interpreting the results…

I wanted to check for the proportional odds assumption, so I tried to fit a acat() model with category-specific effects but the model failed to even start sampling with hundreds of Rejecting initial value: Log probability evaluates to log(0), i.e. negative infinity errors, so I gave up.

I gradually built up the final model I mentioned by adding parameters and comparing them via loo_compare, but where can I see precisely that a model is overfitted?

I’m sorry, but I do not understand. You mean there are too many levels of disfigurement which vary for face?

I already did, but as I already said, I don’t understand too much of the statistical and mathematical finesse to derive the correct methods… In concrete terms, I don’t know how to formulate the method call for this particular task.

I apologise for my naïveté, but I don’t understand, what you mean…

Kindest regards,
Pascal

I see. Apologies for being a little bit high level. I will try to provide more detailed explanation later on today or tomorrow.

I wanted to check for the proportional odds assumption, so I tried to fit a acat() model with category-specific effects but the model failed to even start sampling with hundreds of Rejecting initial value: Log probability evaluates to log(0), i.e. negative infinity errors, so I gave up.

You could trying using inits = 0 or init_r = 0.1 (or smaller for the latter) to initialize the model.

I gradually built up the final model I mentioned by adding parameters and comparing them via loo_compare , but where can I see precisely that a model is overfitted?

There is no threshold or something. You could start with a model that you know does not overfit (because it is too simple) and then see what actually improves predictions.

I’m sorry, but I do not understand. You mean there are too many levels of disfigurement which vary for face ?

Quite that. What I mean is that you can model it that way, but the hierachical standard deviations and correlations won’t be estimated very precisely as face has just 4 levels.

I apologise for my naïveté, but I don’t understand, what you mean…

Coding categorical variables is not brms specific but works the same way in many model fitting R packages. You could starting by googeling “dummy coding” and “effect coding” to see what that implies with respect to the resulting coefficients. This will also help you understanding what to specify in the hypothesis method.