I am applying the workflow laid out in Aki Vehtari’s case-study here. https://users.aalto.fi/~ave/casestudies/Nabiximols/nabiximols.html
The form of the outcome variable is the same as in the case-study - bounded count indicating days of substance use in the previous 28 days - but the sample is different, people receiving opioid agonist treatment (methadone or buprenorphine) at a public clinic, and the substance is different, amphetamines instead of cannabis. I am estimating the effect of frequency of amphetamine use at start of treatment - a time-invariant bounded count of days of amphetamine use in the previous 28 days - on frequency of amphetamine use during treatment, using a hierarchical model. Clients were monitored for a year, measured irregularly (this is routinely collected data, i.e. not from a clinical trial) so they could have anywhere between 1 and 11 follow-up measurements of past-28-day substance use in addition to the measurement at start of treatment/baseline.
The data are quite different from that in the case-study. The participants in the Sativex trial were all using cannabis daily or near daily, where the clients I am examining, being mostly people with opioid dependence issues, mostly use very little amephtamines. Also the time measure in this study is continuous but was discrete in the case-study data. The scatter plot looks like this. Dots are coloured based on days of use at baseline and, for intuition only, I include loess lines based on groupings of frequency of use at baseline.
Following the model-comparison approach in the case-study I ended up with a beta-binomial model. I include the brms model syntax below. outcome
is the outcome variable, amephtamine use in previous 28 days, set
is the possible number of days (always 28). ats_baseline
is days of use at baseline (time-invariant predictor) and yearsFromStart
is how many years from start of treatment each measurement was made (continuous predictor between 0 and 1).
fit_betabinomial2 <- brm(formula = outcome | trials(set) ~ ats_baseline*yearsFromStart + (1 | encID),
data = workingDF,
beta_binomial(),
prior = c(prior(normal(0, 3),
class = Intercept),
prior(normal(0, 3),
class = b),
prior(cauchy(0, 3),
class = sd)),
save_pars = save_pars(all=TRUE),
seed = 1234,
refresh = 0,
cores = 4)
I cannot reproduce the the dataset here because it is protected and quite large, but I would like some advice about the posterior predictive checks. This is the histogram of posterior predictive replicates. It is pretty good. It predicts probability at the lower end of the scale around 0 days use, but it seems a bit deficient for the upper end, around 28 days.
And this is the pit-ecdf calibration plot.
Once again not bad but strays outside the envelope at the upper regions.
Finally a calibration check with a reliability diagram (once again copied from Aki Vehtari’s workflow) for 0 days use vs others
I could use some help interpreting these graphs (especially the final one) diagnosing problems, and with some suggestions for how to improve estimation (negative-binomial for instance?)