How to model a bimodal distribution with brms

BiancaS · September 29, 2020, 3:36pm

Hi there,

I am trying to fit a mixed model with brms to a bimodal distribution, and initially kept the defaults for distribution family (= gaussian). This is my model:

prior1 <- c(set_prior("normal(0,10)", class = "b"), # sets all coefficient priors
            set_prior("normal(0,10)", class = "Intercept"),
            set_prior("cauchy(0,5)", class = "sd")) 

bayes_conf$m1 <- brm(formula = ALLAccuracies ~ lg10JerksZ*allConditions*lg10OwnStimDiff_jerkZ +
                       (1 + lg10JerksZ*lg10OwnStimDiff_jerkZ*allConditions || allVidIDs) +
                       (1 + lg10JerksZ*lg10OwnStimDiff_jerkZ*allConditions || allSubs), 
                     data = db, warmup = 1000, iter = 5000, 
                     cores = options(mc.cores = parallel::detectCores()),
                     chains = 4, control = list(adapt_delta = .99), 
                     prior = prior1, sample_prior = TRUE,
                     save_all_pars = TRUE)

And this is how the posterior predictive distribution fits onto the observed values:
model1-post_pred

I assume by specifying a different distribution family I could increase the fit of my model. However, I am new to bayesian modelling and R, and I have no idea which type of distribution family would be suitable to represent my clearly bimodal outcome variable. Any suggestions would be greatly appreciated.

Ara_Winter · September 29, 2020, 3:56pm

Hi and welcome!

Can you share a snippet of the raw data or simulated data to give an idea of what the data look like?

There a tutorial here https://tem11010.github.io/regression_brms/ that uses a bimodel data set with the Gaussian family in brms,

BiancaS · September 29, 2020, 4:12pm

Hi Ara,

here is the data of my outcome variable & predictors:
stanForums_dat.csv (24.7 KB)

I have come across the website you linked, and they suggest that changing the distribution family could lead to improvement, but give no insight into what family that would be…

andymilne · September 29, 2020, 10:59pm

Does your dependent variable ALLAccuracies have a minimum possible value of 0? Does it have a maximum possible value (e.g. of 10) or can it be arbitrarily high? I ask because the range of your response data can help determine the best family to use.

Guido_Biele · September 30, 2020, 6:37am

You could try a mixture model: https://rdrr.io/cran/brms/man/mixture.html

BiancaS · September 30, 2020, 9:55am

Thanks everyone for your replies so far.

My dependent variable ranges from -5.6 to 9. So contains negative values and the maximum possible value is 9.

Thanks for your suggestion. I have come across this post before, but I guess I was hoping so far there would be an easier fix to my problem. Maybe there isn’t…

andymilne · September 30, 2020, 12:00pm

Is there a theoretical minimum? If not, it may be beneficial to convert your “accuracy” score to an “inaccuracy” score by subtracting it from 9. Then use a family which supports non-negative reals such as hurdle_gamma or hurdle_lognormal. Choosing gaussian when another family is more appropriate can certainly lead to a model not capturing bimodality in the data.

BiancaS · October 3, 2020, 10:06am

Hi Andy,

That sounds like a reasonable approach. Although I still don’t understand how a hurdle_gamma / hurdle_lognormal family would capture bimodality?

Anyway, I’ve tried changing the family specification and now Stan seems to throw a problem.

I’ve opened another topic for this new issue here: https://discourse.mc-stan.org/t/brms-model-doesnt-run-after-custom-family-hurdle-gamma-specified/18365

andymilne · October 3, 2020, 10:21am

My hope is that the “bimodality” arises from your predictors – some sets of predictor values produce low accuracy, some high. If that is not the case, it suggests either you are missing an important predictor or that the distribution is genuinely bimodal (by “genuinely bimodal” I mean accuracy has two modes when all experimental conditions are otherwise the same).

BiancaS · October 3, 2020, 10:37am

I see, that makes sense. And should definitely be true for me, as my categorical predictor should describe the bimodality almost perfectly. Let’s see, hopefully once I figure out how to run the model with one of your suggested families, it will work.

Thanks so far!

andymilne · October 3, 2020, 10:44am

Just for starters, while the hurdle distribution doesn’t seem to be functioning, it may be worth using something like 9.01 - accuracy as your dependent variable (9.01 to ensure all values are slightly larger than 0), and then modelling with a Gamma or lognormal family.

BiancaS · October 3, 2020, 12:28pm

Thanks! Was going to try that, but not actually none of my brm models will run anymore, and I have no idea why. Any input in this topic: Brms model doesn't run after custom family (hurdle_gamma()) specified
would be hugely appreciated

JAQuent · October 12, 2023, 3:27am

I have a similar issue and I am wondering what would you do if the distribution is genuinely bimodal?

andymilne · October 12, 2023, 3:37am

If you don’t have an additional factor that would explain the bimodality and could be included in the model, you could use a mixture model - e.g., Finite Mixture Families in brms — mixture • brms

Topic		Replies	Views
Finding appropriate mixture distribution for brms model Modeling fitting-issues , specification , brms	3	1363	June 2, 2022
Family mixtures, brms brms	1	606	October 8, 2020
Specifying Bimodal Skew Symmetric Normal Distribution as Prior in brms() Modeling rstan , specification , brms	1	558	February 9, 2023
Convergence issues with brms mixture models Modeling brms	6	1844	May 18, 2022
Parameter contrasts in Bayesian linear regression model using brms brms	11	2302	February 26, 2019

How to model a bimodal distribution with brms

Related topics