I am trying to fit a mixed model with brms to a bimodal distribution, and initially kept the defaults for distribution family (= gaussian). This is my model:
prior1 <- c(set_prior("normal(0,10)", class = "b"), # sets all coefficient priors
set_prior("normal(0,10)", class = "Intercept"),
set_prior("cauchy(0,5)", class = "sd"))
bayes_conf$m1 <- brm(formula = ALLAccuracies ~ lg10JerksZ*allConditions*lg10OwnStimDiff_jerkZ +
(1 + lg10JerksZ*lg10OwnStimDiff_jerkZ*allConditions || allVidIDs) +
(1 + lg10JerksZ*lg10OwnStimDiff_jerkZ*allConditions || allSubs),
data = db, warmup = 1000, iter = 5000,
cores = options(mc.cores = parallel::detectCores()),
chains = 4, control = list(adapt_delta = .99),
prior = prior1, sample_prior = TRUE,
save_all_pars = TRUE)
And this is how the posterior predictive distribution fits onto the observed values:
I assume by specifying a different distribution family I could increase the fit of my model. However, I am new to bayesian modelling and R, and I have no idea which type of distribution family would be suitable to represent my clearly bimodal outcome variable. Any suggestions would be greatly appreciated.
here is the data of my outcome variable & predictors: stanForums_dat.csv (24.7 KB)
I have come across the website you linked, and they suggest that changing the distribution family could lead to improvement, but give no insight into what family that would be…
Does your dependent variable ALLAccuracies have a minimum possible value of 0? Does it have a maximum possible value (e.g. of 10) or can it be arbitrarily high? I ask because the range of your response data can help determine the best family to use.
My dependent variable ranges from -5.6 to 9. So contains negative values and the maximum possible value is 9.
Thanks for your suggestion. I have come across this post before, but I guess I was hoping so far there would be an easier fix to my problem. Maybe there isn’t…
Is there a theoretical minimum? If not, it may be beneficial to convert your “accuracy” score to an “inaccuracy” score by subtracting it from 9. Then use a family which supports non-negative reals such as hurdle_gamma or hurdle_lognormal. Choosing gaussian when another family is more appropriate can certainly lead to a model not capturing bimodality in the data.
My hope is that the “bimodality” arises from your predictors – some sets of predictor values produce low accuracy, some high. If that is not the case, it suggests either you are missing an important predictor or that the distribution is genuinely bimodal (by “genuinely bimodal” I mean accuracy has two modes when all experimental conditions are otherwise the same).
I see, that makes sense. And should definitely be true for me, as my categorical predictor should describe the bimodality almost perfectly. Let’s see, hopefully once I figure out how to run the model with one of your suggested families, it will work.
Just for starters, while the hurdle distribution doesn’t seem to be functioning, it may be worth using something like 9.01 - accuracy as your dependent variable (9.01 to ensure all values are slightly larger than 0), and then modelling with a Gamma or lognormal family.