How to model a bimodal distribution with brms

Hi there,

I am trying to fit a mixed model with brms to a bimodal distribution, and initially kept the defaults for distribution family (= gaussian). This is my model:

prior1 <- c(set_prior("normal(0,10)", class = "b"), # sets all coefficient priors
            set_prior("normal(0,10)", class = "Intercept"),
            set_prior("cauchy(0,5)", class = "sd")) 

bayes_conf$m1 <- brm(formula = ALLAccuracies ~ lg10JerksZ*allConditions*lg10OwnStimDiff_jerkZ +
                       (1 + lg10JerksZ*lg10OwnStimDiff_jerkZ*allConditions || allVidIDs) +
                       (1 + lg10JerksZ*lg10OwnStimDiff_jerkZ*allConditions || allSubs), 
                     data = db, warmup = 1000, iter = 5000, 
                     cores = options(mc.cores = parallel::detectCores()),
                     chains = 4, control = list(adapt_delta = .99), 
                     prior = prior1, sample_prior = TRUE,
                     save_all_pars = TRUE)

And this is how the posterior predictive distribution fits onto the observed values:
model1-post_pred

I assume by specifying a different distribution family I could increase the fit of my model. However, I am new to bayesian modelling and R, and I have no idea which type of distribution family would be suitable to represent my clearly bimodal outcome variable. Any suggestions would be greatly appreciated.

Hi and welcome!

Can you share a snippet of the raw data or simulated data to give an idea of what the data look like?

There a tutorial here https://tem11010.github.io/regression_brms/ that uses a bimodel data set with the Gaussian family in brms,

Hi Ara,

here is the data of my outcome variable & predictors:
stanForums_dat.csv (24.7 KB)

I have come across the website you linked, and they suggest that changing the distribution family could lead to improvement, but give no insight into what family that would be…

Does your dependent variable ALLAccuracies have a minimum possible value of 0? Does it have a maximum possible value (e.g. of 10) or can it be arbitrarily high? I ask because the range of your response data can help determine the best family to use.

You could try a mixture model: https://rdrr.io/cran/brms/man/mixture.html

1 Like

Thanks everyone for your replies so far.

My dependent variable ranges from -5.6 to 9. So contains negative values and the maximum possible value is 9.

Thanks for your suggestion. I have come across this post before, but I guess I was hoping so far there would be an easier fix to my problem. Maybe there isn’t…

Is there a theoretical minimum? If not, it may be beneficial to convert your “accuracy” score to an “inaccuracy” score by subtracting it from 9. Then use a family which supports non-negative reals such as hurdle_gamma or hurdle_lognormal. Choosing gaussian when another family is more appropriate can certainly lead to a model not capturing bimodality in the data.

Hi Andy,

That sounds like a reasonable approach. Although I still don’t understand how a hurdle_gamma / hurdle_lognormal family would capture bimodality?

Anyway, I’ve tried changing the family specification and now Stan seems to throw a problem.

I’ve opened another topic for this new issue here: https://discourse.mc-stan.org/t/brms-model-doesnt-run-after-custom-family-hurdle-gamma-specified/18365

My hope is that the “bimodality” arises from your predictors – some sets of predictor values produce low accuracy, some high. If that is not the case, it suggests either you are missing an important predictor or that the distribution is genuinely bimodal (by “genuinely bimodal” I mean accuracy has two modes when all experimental conditions are otherwise the same).

1 Like

I see, that makes sense. And should definitely be true for me, as my categorical predictor should describe the bimodality almost perfectly. Let’s see, hopefully once I figure out how to run the model with one of your suggested families, it will work.

Thanks so far!

Just for starters, while the hurdle distribution doesn’t seem to be functioning, it may be worth using something like 9.01 - accuracy as your dependent variable (9.01 to ensure all values are slightly larger than 0), and then modelling with a Gamma or lognormal family.

1 Like

Thanks! Was going to try that, but not actually none of my brm models will run anymore, and I have no idea why. Any input in this topic: Brms model doesn't run after custom family (hurdle_gamma()) specified
would be hugely appreciated