Hello,
I apologize if this is the incorrect forum to ask this question. I am happy to delete it and redirect it elsewhere if it is inappropriate.
I am modeling a small dataset (n = 256) with a few predictors using brms (brms 2.17.0, RStudio 4.2.1). The data come from an experiment in which 64 participants each participated in two conditions, with two trials per condition (several participants had to have their data on one trial thrown out). In each trial, participants could either respond one way (= 0) or another (= 1). Thus, on a trial-by-trial basis, the data are bernoulli distributed. However, on a condition-by-condition basis, the data are binomially distributed (i.e., participants could receive a sum total of 0, 1, or 2 per condition).
I have successfully modeled the data as intended. My question is, rather, about what the most appropriate modeling strategy is. While modeling the data as binomially distributed requires cutting the number of observations in half (i.e., because I have to aggregate the data within conditions), it also allows taking account in the model formula of the number of trials participants completed (i.e., most completed 4 trials, but some completed only 3 trials). Is it somehow “more informative” or otherwise preferable to use one data distribution over another? In particular, is it preferable, on some sort of statistical grounds, to use one observation model over another in this particular case? Presently, I have used the “bernoulli” and “binomial” (and “beta-binomial”, but the overdispersion parameter was unnecessary) families to model the data.
It is worth mentioning that all models, regardless of the assumed data distribution, give very similar posterior parameter estimates, although a comparison of models fitted to the binomial data was slightly more skewed towards favoring a full model (with an interaction term) compared to a null model than was a comparison of models fitted to bernoulli data.
Thanks for any information or guidance offered.