How to get predictions on original scale of data from add_epred_draws() from an ordinal regression in brms

I am conducting an ordinal regression examining the effect of number of days of amphetamine-type-stimulant use in the 28 days prior to starting treatment (variable ats_baseline) on psychological health, measured on a 0-10 integer scale (variable psychHealth)during the first years of treatment (variable yearsFromStart, a continuous variable between 0 and 1). Data are grouped by encounter (variable encID) which defines a block of treatment. For our purposes this can be considered equivalent to a participant ID number. The minimum number of measurements per encID is two. All encounters have psychHealth measured at start of treatment (i.e. yearsFromStart= 0) and at least one other time. When these non start of treatment measurements occur is irregular. Here are the data.

amphUseInOTP_psych.RData (16.7 KB)

Now for the model in brm. NOTE: we add 1 to all values of the psychHealth variable to make it a non-zero positive integer, as required for ordinal models, so the scale goes from 0-10 to 1-11.

# add 1 all values to make the non-zero integers required for cumulative link models 
amphUseInOTP_psych$psychHealth <- amphUseInOTP_psych$psychHealth + 1

# now the model itself

library(rstan)
library(brms)

fit_ordinal_psych <- brm(formula = psychHealth ~ ats_baseline*yearsFromStart + (1 | encID),
                         family = cumulative("probit"),
                         data = amphUseInOTP_psych,
                         seed = 1234,
                         refresh = 0)

Now I want to get predicted levels of psychological health at five timepoints - 0 years from start of treatment, 0.25 years, 0.5 years, 0.75 years, 1 year - for five candidate levels of ats_baseline: 0 days used prior to start of treatment, 7 days, 14 days, 21 days and 28 days.

I can do this very easily with data_grid() from modelr and add_predicted_draws() from tidybayes

# set data_grid then draw from full posterior, complete with variability from parameters AND measurement error. 

library(modelr)
library(tidybayes)

tibble(encID=688) %>%
  data_grid(yearsFromStart = seq(0,1,0.25),
            ats_baseline=c(0,7,14,21,28),
            encID) %>%
    add_predicted_draws(object = fit_ordinal_psych,
                        allow_new_levels = TRUE) -> predOrdDraws_psych

head(predOrdDraws_psych)

# output
#   yearsFromStart ats_baseline encID  .row .chain .iteration .draw .prediction
# <dbl>        <dbl> <dbl> <int>  <int>      <int> <int> <fct>      
# 1              0            0   688     1     NA         NA     1 4          
# 2              0            0   688     1     NA         NA     2 7          
# 3              0            0   688     1     NA         NA     3 9          
# 4              0            0   688     1     NA         NA     4 6          
# 5              0            0   688     1     NA         NA     5 5          
# 6              0            0   688     1     NA         NA     6 7

So far so good, the predicted values are contained in the .predictioncolumn.

But I would like to get predicted values without the aleatoric uncertainty, i.e. the fitted or expected values. The add_epred_draws() function does this in a very similar way to add_predicted_draws()

# get fitted values from the posterior
tibble(encID=688) %>%
  data_grid(yearsFromStart = seq(0,1,0.25),
            ats_baseline=c(0,7,14,21,28),
            encID) %>%
    add_epred_draws(object = fit_ordinal_psych,
                    allow_new_levels = TRUE) -> epredOrdDraws_psych

# view dataset with fitted draws, note remove the '.chain' and '.iteration` so they can fit on the screen without breaking up the dataset (these are all NA anyway)
library(tidyverse)
head(epredOrdDraws_psych %>% select(-one_of(c(".chain", ".iteration"))))

# output
#   yearsFromStart ats_baseline encID  .row .draw .category    .epred
# <dbl>        <dbl> <dbl> <int> <int> <fct>         <dbl>
# 1              0            0   688     1     1 1         0.0127   
# 2              0            0   688     1     2 1         0.000144 
# 3              0            0   688     1     3 1         0.0000793
# 4              0            0   688     1     4 1         0.000523 
# 5              0            0   688     1     5 1         0.00141  
# 6              0            0   688     1     6 1         0.0000354

Hopefully you can see my problem: The variable with the fitted values drawn from the posterior - .epred- is no longer on the original 1-11 scale.

Can anyone tell me how to get these back on the original scale?

That is because your ordinal levels are now given as categories and the estimates are the probability/propensity of that category for each predictor combination. You could use those to get an average:

tibble(encID=688) %>%
  data_grid(yearsFromStart = seq(0,1,0.25),
            ats_baseline=c(0,7,14,21,28),
            encID) %>%
    add_epred_draws(object = fit_ordinal_psych,
                    allow_new_levels = TRUE) %>%
    group_by(yearsFromStart, ats_baseline, encID, .draw) %>%
    summarise(avg_lvl = weighted.mean(as.numeric(.category), .epred))
2 Likes

Thank you @amynang. It’s lucky I don’t aspire to being a detective: many criminals would go free. I didn’t even notice the addition of a ‘category’ column not present in the output of the add_predicted_draws()function, nor that the number of rows had increased by a factor of 11!

Thank you this is great. By getting a weighted average across the eleven categories you end up with the same number of rows as the output from the add_predicted_draws() function. All I need to do is subtract 1 from your avg_lvl column to bring the estimate back from the 1-11 scale to the original 0-10 scale. And I can get median and CI for across the draws for each of the 25 pairwise yearsFromStartx ats_baselinecombinations. Totally brilliant. Thank you so much.