I am conducting an ordinal regression examining the effect of number of days of amphetamine-type-stimulant use in the 28 days prior to starting treatment (variable ats_baseline
) on psychological health, measured on a 0-10 integer scale (variable psychHealth)
during the first years of treatment (variable yearsFromStart
, a continuous variable between 0 and 1). Data are grouped by encounter (variable encID
) which defines a block of treatment. For our purposes this can be considered equivalent to a participant ID number. The minimum number of measurements per encID
is two. All encounters have psychHealth
measured at start of treatment (i.e. yearsFromStart
= 0) and at least one other time. When these non start of treatment measurements occur is irregular. Here are the data.
amphUseInOTP_psych.RData (16.7 KB)
Now for the model in brm.
NOTE: we add 1 to all values of the psychHealth
variable to make it a non-zero positive integer, as required for ordinal models, so the scale goes from 0-10 to 1-11.
# add 1 all values to make the non-zero integers required for cumulative link models
amphUseInOTP_psych$psychHealth <- amphUseInOTP_psych$psychHealth + 1
# now the model itself
library(rstan)
library(brms)
fit_ordinal_psych <- brm(formula = psychHealth ~ ats_baseline*yearsFromStart + (1 | encID),
family = cumulative("probit"),
data = amphUseInOTP_psych,
seed = 1234,
refresh = 0)
Now I want to get predicted levels of psychological health at five timepoints - 0 years from start of treatment, 0.25 years, 0.5 years, 0.75 years, 1 year - for five candidate levels of ats_baseline
: 0 days used prior to start of treatment, 7 days, 14 days, 21 days and 28 days.
I can do this very easily with data_grid()
from modelr
and add_predicted_draws()
from tidybayes
# set data_grid then draw from full posterior, complete with variability from parameters AND measurement error.
library(modelr)
library(tidybayes)
tibble(encID=688) %>%
data_grid(yearsFromStart = seq(0,1,0.25),
ats_baseline=c(0,7,14,21,28),
encID) %>%
add_predicted_draws(object = fit_ordinal_psych,
allow_new_levels = TRUE) -> predOrdDraws_psych
head(predOrdDraws_psych)
# output
# yearsFromStart ats_baseline encID .row .chain .iteration .draw .prediction
# <dbl> <dbl> <dbl> <int> <int> <int> <int> <fct>
# 1 0 0 688 1 NA NA 1 4
# 2 0 0 688 1 NA NA 2 7
# 3 0 0 688 1 NA NA 3 9
# 4 0 0 688 1 NA NA 4 6
# 5 0 0 688 1 NA NA 5 5
# 6 0 0 688 1 NA NA 6 7
So far so good, the predicted values are contained in the .prediction
column.
But I would like to get predicted values without the aleatoric uncertainty, i.e. the fitted or expected values. The add_epred_draws()
function does this in a very similar way to add_predicted_draws()
# get fitted values from the posterior
tibble(encID=688) %>%
data_grid(yearsFromStart = seq(0,1,0.25),
ats_baseline=c(0,7,14,21,28),
encID) %>%
add_epred_draws(object = fit_ordinal_psych,
allow_new_levels = TRUE) -> epredOrdDraws_psych
# view dataset with fitted draws, note remove the '.chain' and '.iteration` so they can fit on the screen without breaking up the dataset (these are all NA anyway)
library(tidyverse)
head(epredOrdDraws_psych %>% select(-one_of(c(".chain", ".iteration"))))
# output
# yearsFromStart ats_baseline encID .row .draw .category .epred
# <dbl> <dbl> <dbl> <int> <int> <fct> <dbl>
# 1 0 0 688 1 1 1 0.0127
# 2 0 0 688 1 2 1 0.000144
# 3 0 0 688 1 3 1 0.0000793
# 4 0 0 688 1 4 1 0.000523
# 5 0 0 688 1 5 1 0.00141
# 6 0 0 688 1 6 1 0.0000354
Hopefully you can see my problem: The variable with the fitted values drawn from the posterior - .epred
- is no longer on the original 1-11 scale.
Can anyone tell me how to get these back on the original scale?