Setting priors and interpreting ordered logistic model

Hi,

I am trying to fit a model to my ordered categorical data. I have been using : Bayesian ordinal regression with random effects using brms to help, as well as https://journals.sagepub.com/doi/full/10.1177/2515245918823199
to help. I am very new to bayesian models, so I am aware that I have probably made mistakes, so gentle explanations would be much appreciated!

I am trying to determine whether three different continents (Africa, Asia, Europe) have different numbers of life expectancies at different levels (low, med, high).
Essentially: does continent predict how many low/med/high life expectancies in that group?

My question is about priors. Here I have used freely available data to replicate my data. I am a little stuck on priors. I run my model and have not specified any priors. Do I need to? If so, how do I choose them? Pretend this is the first data of its kind, how could I know what to set? The only thing I know is that the data will be positive. I would be so grateful for some help in working out priors/what to put.

Then, once my model has run I check the conditional effects, and I’m not so sure what that is showing either. How does one interpret the results, If the Family is acat model and the Links: mu = cloglog, what does this translate to? how can I talk about it in real terms? What does it actually mean to the average person? Finally, is the graph the posterior effects? From my understanding Bayesian models should have priors and posteriors, and I’m not sure I’ve satisfied either!

My data/code:

#load libraries
library(brms)
library("gapminder")
library("dplyr")

#Edit data
data(gapminder)
gapminder <-gapminder
#select columns of interest
gapminder <-gapminder %>% 
  select(continent,lifeExp, pop)
#select continents of interest
gapminder <-gapminder %>% 
  filter(continent == c("Africa","Asia","Europe"))
#Make ordered factors
gapminder$lifeExp<- cut(gapminder$lifeExp, 3, labels=c("low","med","high"), ordered_result=TRUE)

#Run ordinal model 
## Model 1 Acat with category specific effects 
mod1selection_fit <- brm(formula = lifeExp ~ cs(continent),
data = gapminder,
family = acat("cloglog"))

> summary(mod1selection_fit)
 Family: acat 
  Links: mu = cloglog; disc = identity 
Formula: lifeExp ~ cs(continent) 
   Data: gapminder (Number of observations: 460) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
                             Estimate          Est.Error          l-95% CI           u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]                     0.53               0.11              0.32               0.74 1.00     3159     2637
Intercept[2]                     2.35               0.33              1.75               3.06 1.00     2511     2183
continentAsia[1]                 0.71               0.18              0.36               1.06 1.00     3113     2705
continentAsia[2]                 1.99               0.36              1.31               2.74 1.00     2523     2154
continentEurope[1] 258987332313292.19 231077452855566.66 10570797797140.76 968995983082967.00 1.32       10       39
continentEurope[2]               3.45               0.35              2.79               4.20 1.00     2546     2197

Family Specific Parameters: 
     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc     1.00      0.00     1.00     1.00   NA       NA       NA

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).


plotm1 <-conditional_effects(mod1selection_fit, "continent", categorical=TRUE)

  • Operating System:MacOS (m1 chip)
  • brms Version:2.16.3

Thank you for your patience in reading this, a lot of questions but am trying hard to get my head around this!

There are multiple ways to approach this, but my response will be centered around one of your opening statements: “I am very new to Bayesian models.”

Given that you have a background in the social sciences, my guess is you’re familiar with the normal distribution and with z-scores (i.e., values from the standardized normal distribution). Even though it might not be as good of a fit your data as the adjacent category model, you might be better off using the cumulative model with the probit link. Why? Because that likelihood function puts your model on the z-score scale. At first, this still won’t be the easiest to think of for setting your threshold priors, but it will probably make it much easier to think about your priors for your predictor variable, continent. Those parameters will be in the metric of a standardized mean difference from the reference category, what is essentially an ordinal Cohen’s d effect size. So if it were me where I was new to both these data and to this kind of model framework, I’d start with a model like this:

mod2selection_fit <- brm(
  formula = lifeExp ~ 1 + continent,
  data = gapminder,
  family = cumulative(probit),
  prior = c(
    prior(normal(0, 2), class = Intercept),
    prior(normal(0, 1), class = b)))
 Family: cumulative 
  Links: mu = probit; disc = identity 
Formula: lifeExp ~ 1 + continent 
   Data: gapminder (Number of observations: 460) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]        0.10      0.08    -0.06     0.26 1.00     3951     2911
Intercept[2]        1.47      0.11     1.26     1.69 1.00     2764     3112
continentAsia       1.17      0.13     0.91     1.42 1.00     2899     2919
continentEurope     3.03      0.21     2.64     3.44 1.00     2482     2857

Family Specific Parameters: 
     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc     1.00      0.00     1.00     1.00   NA       NA       NA

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

See the coefficient summaries for continentAsia and continentEurope? Those are in the metric of standardized mean differences, relative to the reference category Africa. In this case, the effect sizes are rather large and probably much larger than you’ll find in social science data that isn’t based on life expectancy differences.

For more info on how to fit and interpret cumulative probit models like this, check our chapter 23 in Kruschke (Doing Bayesian Data Analysis) and my brms translation of the same (23 Ordinal Predicted Variable | Doing Bayesian Data Analysis in brms and the tidyverse).

Also, I should add that it is possible to add category-specific effects in a model like this with the cs() helper. But if you’re new to a model like this, that might be a lot to juggle and I’d recommend starting simple and only jumping up to category-specific effects once you’re well grounded in the simpler models.

2 Likes

Thank you so much for your answer, working through it slowly! Just to check, I get an error when I run your code on my actual data, which is
Error in cumulative(probit) : object 'probit' not found

the variables are the same, a factor and ordered factor (lifeExp), I’m not sure where I’m going wrong

You probably just need to put probit into quotes :-)

family = cumulative("probit")
1 Like

No, tried that! Gives :

"Error in UseMethod("cumulative") : 
  no applicable method for 'cumulative' applied to an object of class "character"

When I use Solomon’s answer on the data I provided, it works. But when I use it on my own data of a similar structure, I get the error

Are your categorical variables coded as factors in the data frame? – I’m clutching at straws here.

Yes they are, but I’ve solved it:
It works if I put

  brmsfamily("cumulative","probit"),

So there must be a package in the project I’m using, that isn’t in the project I set up the data for this question in :)

2 Likes

thank you for clutching at straws with me :)

1 Like

A follow up question, would it make sense to change the reference category and run it again?

Sure. Putting the whole ordinal issue aside, code your data so that you are using a meaningful reference category.

1 Like

Hi ! I’ve added a new question/follow up on interpreting: Question about interpreting ordinal cumulative model with probit link