Convergence issues with brms mixture models

BiancaS · April 16, 2022, 7:55am

Hi there,

I’ve recently started trying to fit mixture models (using brms) to account for bimodality in my response distribution. While the posterior predictive distribution seems to fit the data reasonably well using a mix of two gaussian distributions (at least better than using a standard non- mixture model), I am now getting convergence issues where before I had no problems whatsoever getting my models to converge. It also seems that my models take a lot longer to run than what I’m used to (one model literally took roughly 24 hours to run).

Could anyone point me towards why I am seeing these convergence issues with mixture models where I wasn’t having any before and knows some way to fix this? I am a relative beginner with both brms and even more so with mixture models.

Please see below for my code:

mix <- mixture(gaussian, gaussian)

prior <- c(prior(normal(-2,10), Intercept, dpar = mu1),
               prior(normal(7,10), Intercept, dpar = mu2),
               prior(normal(0,10), b, dpar = mu1),
               prior(normal(0,10), b, dpar = mu2),
               prior(cauchy(0,.5), sd, dpar = mu1),
               prior(cauchy(0,.5), sd, dpar = mu2))

mixture_model <- brm(bf(formula = accuracy ~ drug +
                          (1 | sub) +
                          (1 | item)),
                     data = dat, 
                     family = mix,
                     warmup = 1000, iter = 5000, 
                     cores = parallel::detectCores(),
                     chains = 4, control = list(adapt_delta = .99), 
                     prior = prior, sample_prior = TRUE,
                     save_pars = save_pars(all = TRUE))

And these are the convergence warnings I’m getting:

Warning messages:
1: Rows containing NAs were excluded from the model. 
2: There were 5820 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them. 
3: There were 8002 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
https://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded 
4: There were 2 chains where the estimated Bayesian Fraction of Missing Information was low. See
https://mc-stan.org/misc/warnings.html#bfmi-low 
5: Examine the pairs() plot to diagnose sampling problems
 
6: The largest R-hat is 2.62, indicating chains have not mixed.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#r-hat 
7: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess 
8: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#tail-ess

And below for some system specs:
OS: MacOS Monterey 12.3 (M1 Mac)
R version 4.3.1
brms version 2.16.3

Highly appreciate any kind of advice on this!

scholz · April 20, 2022, 8:07am

Could you check (and post) your trace plots? I suspect that your model might have problems with separating the modes without priors to push them apart. You might see some chains converge on one mode and some on another.

BiancaS · April 20, 2022, 12:49pm

Hi, thanks for your reply. Exactly what you’re describing seems to be happening, every time it’s a different chain that doesn’t converge and very rarely the model even converges. Just not reliably.

See below for an example trace plot:

Would you mind explaining what you mean by ‘problems with separating the modes without priors to push them apart’?

Thanks a lot!

scholz · April 20, 2022, 12:58pm

My guess is that the sd of your intercept priors is way too wide. It allows both intercepts parameters to cover both modes. So my first step would be to reduce the sd so that the mixture parts don’t reasonably overlap with both modes.

BiancaS · April 21, 2022, 9:57am

Thanks a lot for that tip, that does indeed seem to have solved my problem! Chains now perfectly mix.

scholz · April 23, 2022, 8:55am

Nice. just as a small illustration, this is what those priors looked like. You can see that there is a large area where they overlap aka both mixture parts explore that space and might catch a mode on their journey.

Just for fun, you might want to fit one of those mixture components with the sample_prior = "only" option and look at the pp_check output for how those priors you specified translate onto the outcome scale. I would guess that they produce unreasonable large results.

BiancaS · May 18, 2022, 2:45am

Thank you again very much for your help and your recent illustration. It seems that both defining smaller SDs for the mixture components’ intercept priors as well as defining the correct proportions of mixture components 1 and 2 (using theta) helped convergence in my case - although I’m a bit surprised that I had to set the SDs as low as 0.1 to get convergence, while plotting suggests I should be able to use higher SD values and still be able to discriminate the two distributions.

Topic		Replies	Views
Finding appropriate mixture distribution for brms model Modeling fitting-issues , specification , brms	3	1322	June 2, 2022
Issues with convergence for latent variable model in brms Modeling brms	9	1527	June 7, 2022
Help eliminating divergent transitions for gamma mixture model brms	1	931	December 28, 2018
Convergence issues with gamma mixture models in brms: intercepts keep "swapping" Modeling mixture , brms	6	1048	April 27, 2022
Example growth mixture model won't converge brms fitting-issues	20	1988	October 29, 2020

Convergence issues with brms mixture models

Related topics