Finding appropriate mixture distribution for brms model

Hi,

I have been trying to find the correct mixture of distributions to adequately model my bimodal continuous response distribution with brms. Comparing several different combinations of distribution families with LOO has brought me to a point where I think the first component of my response distribution is rather well represented by the posterior, see below for the latest model & fit:

mix = mixture(gaussian, skew_normal, order = 'mu')

prior <- c(prior(normal(-2.5,.1), Intercept, dpar = mu1),
            prior(normal(7,.1), Intercept, dpar = mu2),
            prior(normal(-8,1), alpha2))

model <- brm(bf(formula = accuracy  ~ 1, 
                    theta1 = 1, theta2 = 3),
                    data = dat,
                    family = mix,
                    warmup = 1000, iter = 5000, 
                    cores = parallel::detectCores(),
                    chains = 4, control = list(adapt_delta = .97), 
                    prior = prior,
                    inits = 0,
                    save_pars = save_pars(all = TRUE))

However, as you can see by the posterior predictive checks (below, left posterior distribution) the model isn’t doing so well for the second distribution component.
As my outcome variable is truncated at 9, I’ve then added an upper bound of 9 to the formula (see below, right), and upon visual comparison of the two pp checks, to me it looks like truncating has imroved the fit between data and predictions:

But model comparison is strongly in favour of the model that is not truncated:

image

In addition, it takes suspiciously long to add the fit criteria for model comparison (using add_criterion) to the truncated model (roughly 1.5 hours whereas add_criterion for the non-truncated model takes a couple of minutes), and I wonder why that is.

Now my main question is if anyone can think of a better way to model this second component than a truncated skew-normal distribution? I am relatively new to using mixture models with brms, so would highly appreaciate any help with this.

Do you have anything in your data that might explain the bimodality? In that case it might be easier to expand your formula instead of using mixtures immediately.
Even if you have no observations that might help you identify the modes, maybe you have a theory about the underlying process and can use that to inform your choices of mixture families.
It is hard to make suggestions here without any information about your data.

My experience with truncated models is that they are a pain to fit. Did you take a look at the diagnostics to make sure that the model converges and has no problems?

Hi,

Thanks very much your response and your continued help.

To answer your question, sadly none of my predictors explain the two modes any better than a model without the predictors. Another problem is that as soon as I add predictors to my mixture model it doesn’t converge anymore (convergence looks fine based on trace plots using just an intercept model).

I do have a theory of the underlying process however so far this has not helped me get a better fit by selecting different mixture families. My response variable is a measure of accuracy in identifying the correct identity of a stimulus, which is calculated by subtracting the mean of ratings given on 3 (continuous) distractor scales from the rating given on a target scale. So basically my ‘accuracy’ measure indicates how well participant could discriminate a target stimulus from distractor options. My theory is that my stimulus set (which is itself created by a different sample of participants) consists of both really easy and really difficult items, but few stimuli which are ‘in between’ - hence the dip in my response distribution.

I have had some success in increasing the model fit (also confirmed by Loo) by modelling a mixture of 3 different distributions:

mix <- mixture(skew_normal, gaussian, skew_normal, order = 'mu')

prior <- c(prior(normal(-2.5,.1), Intercept, dpar = mu1),
            prior(normal(5,.1), Intercept, dpar = mu2),
            prior(normal(7,.1), Intercept, dpar = mu3),
            prior(normal(8,1), alpha1),
            prior(normal(-8,1), alpha3))

model <- brm(bf(formula = accuracy ~ 1,
              theta1 = 0.3, theta2 = 0.4, theta3 = 0.3),
              data = dat,
              family = mix,
              warmup = 1000, iter = 5000,
              cores = parallel::detectCores(),
              chains = 4, control = list(adapt_delta = .97),
              prior = prior,
              init = 0,
              save_pars = save_pars(all = TRUE))

However as mentioned above, as soon as I add one predictor and two random intercepts, I cannot get the model to converge (see trace plots below). I’ve tried raising ‘adapt_delta’ to .99 and setting custom initials (using this blog post: Don't forget your inits | A. Solomon Kurz) for the intercepts, but this hasn’t helped much - and I don’t know how to best choose initial values for my predictor without biasing the model.

Again, any insights on what else I might do to get my model to converge would be great.

To answer your question, sadly none of my predictors explain the two modes any better than a model without the predictors.

Shame. A multilevel model would be easier to interpret.

That sounds like it should lend itself to a multilevel structure. I have no experience working with data from humans (shudder) but I could see a varying intercept for each participant and probably also for each task. That would model different difficulty of the tasks and different basic skill in participants I think.

My theory is that my stimulus set (which is itself created by a different sample of participants)

If there are multiple stimuli from a single “past” participant, there might also be an interaction between the past and present participants eg. “current” participant has problems with all of the stimuli from a certain “past” participant.

All of those thoughts are obviously dependent on you having access to that data.

However as mentioned above, as soon as I add one predictor and two random intercepts, I cannot get the model to converge (see trace plots below).

So it looks like you kept the mixtures and added the varying intercepts on top. This might pose the problem that the varying intercepts explain the modes which makes the mixtures redundant and makes it hard to identify them. I’d try the varying effects without the mixtures.

It might also be the case that you don’t have enough data to estimate the sd’s of the groups, in which case you could just use the variables themselves instead of a varying effect. You loose partial pooling but might converged.
accuracy ~ 1 + iD instead of accuracy ~ 1 + (1|ID) is what I mean with that.

I hope this was kinda helpful :)