I am trying to conduct a repeated measures bernoulli regression on the following data
any_11toPCUD.RData (10.2 KB)
Two measurement occasions - all11 followed by pCUD of the same Yes/No outcome meetsCrit_any for each person. These are the proportions answering Yes/No at each timepoint.
time meetsCrit_any count tot perc
<fct> <fct> <int> <int> <dbl>
1 all11 No 1306 2054 0.64
2 all11 Yes 748 2054 0.36
3 pCUD No 1414 2054 0.69
4 pCUD Yes 640 2054 0.31
Odds at timepoint all11 is 0.36/(1-0.36) = 0.56 and at timepoint pCUD is 0.31/(1-0.31) = 0.45, resulting in a manual odds-ratio of pCUD/all11 of 0.80. Admittedly this manual assumes independence of observations, which is not true. So with these hand calculations I proceed to the repeated measures logistic regression
brm(formula = meetsCrit_any ~ time + (1|record_id),
data = any_11toPCUD,
family = bernoulli) -> mod_bayes
mod_bayes
# output
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -12.01 3.60 -21.18 -7.19 1.02 288 463
# timepCUD -5.27 1.55 -9.33 -3.24 1.01 411 526
Now clearly this log-odds coefficient is ridiculous given the relatively small difference in proportion, and the Rhat and ESS are very poor given 4000 samples. I am perplexed: thought brms would handle such a simple model effortlessly. Perhaps not so simple.
Can anyone give me some advice on what I am doing wrong and why such a simple model should fail so spectacularly?
Iād really appreciate the insights of the forum here. After finding a spookily similar previous post of mine evidently from a time when I could (perfunctorily) manually code in stan, I followed of the responders advice and did some prior predictive checks. I tried a normal(0, 0.5) on the b and a normal(0,1) on the intercept and the diagnostics were all much better. However still a very large effect of log-odds = -1.9, which is yields an odds ratio of 0.15, nowhere near the 0.8 I calculated manually. Can anyone give me some advice?
Postscript
For reference an equivalent nhst analysis in the GLMMadaptive package using marginal coefficients yields much more sensible results. I include code here for completeness
library(GLMMadaptive)
mixed_model(fixed = meetsCrit_any ~ time,
random = ~1|record_id,
data = any_11toPCUD,
family = binomial) -> mod_glmm
marginal_coefs(object = mod_glmm,
std_errors = T,
sandwich = T) -> margModTime
margModTime
#output
# Estimate Std.Err z-value p-value
# (Intercept) -0.7327 0.0189 -38.6803 < 1e-04
# timepCUD -0.2330 0.0299 -7.8025 < 1e-04
When exponentiated, -0.2330 yields an odds ratio of 0.79, very close to what I calculated manually above.