Hi All,
This question may be a bit “Stan-adjacent” so feel free to close if not appropriate to ask here. I work with categorical, multilevel data. For example that may be counts (like self-reported or wearable devices estimated count of awakenings overnight) or binary (like a daily diary did you sleep well last night yes/no). These can occur in the context of interventions, and so what we want to answer are questions like:
- What is the probability someone sleeps well in the intervention or control arms?
- If we implemented this intervention so the population received it, how much would the probability of sleeping well increase on average?
We can work with odds ratios, but those are generally harder to interpret and less easily linked to outcomes that patients or clinicians or hospital administrators care about.
The basic approach I have been taking to this is:
-
estimate say a mixed effects binary logistic regression in BRMS with random intercept by participant, condition, time, and a condition x time interaction. If we have weeks of data per person, time can be a random slope as well.
-
use Monte Carlo integration (generate random samples from the assumed random effects distribution, add them to fixed effects, back transform to probabilities and then average) to integrate out the random effects to estimate the average predicted probabilities.
-
summarise these posteriors, take differences, etc.
That all works, but I am finding that the Monte Carlo integration is quite slow. A modestly sized problem may take 8s for 100 random samples, 70 for 1000. If I am doing this across a 6 week 42 day) intervention, the time really starts to add up.
Recently, I read some old papers about antithetic pairs in Monte Carlo simulations. Since I usually assume the random effects are multivariate normal, I think this would look basically like:
- generate 100 random samples
- duplicate those but make them negative
- merge the original and negated values with the fixed effects and then average over all of that
On the surface it seems like this could roughly double the samples (and hence precision) of integrating out random effects with much less compute.
I wonder what people here think? I’m also open to other approaches to get to that end goal. I do keep thinking doing it directly in Stan rather than estimate the model using brms and then integrating afterwards would be more efficient, but I quite like the flexibility it gives. I can estimate models, save them, and then post hoc estimate whatever quantities at whatever timepoint we are interested in.
Thank you!