Specifying joint mixture model

Benjamin.Larue · January 8, 2021, 11:02pm

Hi,

I would like to fit joint mixture models to determine life-history trajectories in mammal populations. I am interested in knowing if the probability that individuals belong to a cluster depends on three different responses (i.e. mass gain, reproduction probability and survival probability). A joint mixture model would thus separate individuals based on similarities in the three responses (similar to the figure below created from a model specified with the FlexMix R package in Hamel et al. 2016)

I saw from this thread that multivariate mixture models are possible in brms, but mixture probability is not joint for both responses from what I understand. Can a joint clustering probability be estimated for multiple responses and can the responses be of different families (gaussian and binomial) in brms?

brms Version:2.14.4

Thanks!!!

martinmodrak · January 14, 2021, 4:09pm

Unfortunately, I don’t think this is possible in brms. I am not sure to what extent was this omission intentional, but multivariate mixtures/clusterings often have multimodal posteriors which makes tham hard to sample from with Stan or any other program (and can make maximum-likelihood or similar optimization methods give misleading results), so those are potentially very tricky models to get running.

So even if you implemented the model in Stan is likely to be problematic to fit. One of the big issues is “label-switching” - in 1D you can force an ordering on the clusters/mixture components so that cluster 1 is always the one with the smallest mean, … , cluster N always the one with the largest mean (this is done in brms by default - see Finite Mixture Families in <span class="pkg">brms</span> — mixture • brms). Once the response is multivariate you cannot do that, so in different runs cluster 1 could correspond to different clusters.

There might be an easy way out if you happen to have some domain-specific knowledge that would let you avoid the label-switching problem completely (e.g. that ordering by one of the responses is actually always enough).

The hard way is getting in the weed of the literature - if I remember correctly, some people ignore the multimodality and then postprocess the samples by flipping the components in some chains. There are probably also more clever ways to do this, but I am not knowledgeable enough about them to give you any specific advice.

However, label switching could be a smaller of the problems, clustering in more dimensions can legitimately support multiple very different but all plausible clusters (giving you irreducible multimodality). As a very quick and artificial example, let us assume fitting mixture of two 2D gaussians to data looking like this:

set.seed(254685)
x <- c(rnorm(10), rnorm(10,0.5))
y <- c(rnorm(10), rnorm(10,5))
plot(x,y)

Even when we assume the sds are known, the data can still plausibly support several qualitatively different arrangements…

Does that make sense?

Benjamin.Larue · January 15, 2021, 3:38pm

Thank you @martinmodrak !

Your explanation makes perfect sense. I understand the complexity and difficulties of fitting such models in Stan. I also noted (after posting this thread) that brms does not currently support subject-level clustering which would be an issue here. I guess fitting joint growth mixture models with rstan could potentially be feasible but complex with possibly unreliable outputs.

Topic		Replies	Views
Multivariate mixtures in brms brms mixture	3	928	October 2, 2020
Multivariate normal mixture model in brms brms	3	1121	May 3, 2018
Question re: finite discrete mixture model in brms brms techniques , specification	3	438	February 4, 2021
Brms: multivariate response, different distributions brms	8	3850	July 29, 2020
Help obtaining the probability each data point arose under each mixture component Modeling brms	1	468	August 19, 2022

Specifying joint mixture model

Related topics