Other constraints than mean ordering to identify mixture model?

ldeschamps · September 9, 2020, 5:48pm

Hello!

I am interested in fitting multivariate mixture model, defined as follow :

p(\boldsymbol{y}_{ip} | \boldsymbol{\lambda}, \boldsymbol{\mu},\Sigma) = \sum_{k=1}^K\lambda_{kp}MN(\boldsymbol{\mu}_k, \Sigma)\\ \boldsymbol{\lambda_1} = 0\\ \lambda_{kp} = \alpha_{k} + \beta_{k}x_p\\

Where i stands for individuals and p for an experimental unit. For simplicity, the variance-covariance matrix is common to every observation.

The obvious way to identify the model is to put an ordering constraints on \boldsymbol{\mu}, such as \mu_{j1} < \mu_{j2} < \mu_{jk} for any variable j. This approach works wonderfully in my case, and give well-explored posteriors.

However, some of the variables j are negatively correlated. In that case, it makes no sense to define a component which would have the greater mean in every variable, because components with the greater mean in some variables are expected to have the lower in some other.

Is there another way to identify this model without the mean ordering constraints? I tried to order the intercepts of component weights, but it results in degenerate posteriors, with exchangeable means, even with only two components.

Or is there any other idea I am missing?

Thank you very much!
Lucas

EDIT : for information, my data look generally like this :

ldeschamps · September 10, 2020, 2:01pm

Ok, after having read the posts by @betanalpha, it seems that my quest is kind of vain…

“I have decided that mixtures, like tequila, are inherently evil and should be avoided at all costs.”
Larry Wasserman

spinkney · September 10, 2020, 2:59pm

Try the repulsive prior. It may help. I think just setting rho to 0 works best but you can try it as a parameter like I have here. I made it so rho gets larger for smaller distances but that hinged on the ordering of mu.

functions {
  real repulsive_lpdf(vector mu, vector rho) {
    int K = num_elements(mu);
    matrix[K, K] S = diag_matrix(rep_vector(1, K));
    matrix[K, K] L;
    int c;

    for (k1 in 1:(K - 1))
      for (k2 in (k1 + 1):K){
        c = K - (k2 - k1);
        S[k1, k2] = exp(- squared_distance(mu[k1], mu[k2]) / (0.5 + rho[c]));
        S[k2, k1] = S[k1, k2];
      }

    L = cholesky_decompose(S);
    
    return 2 * sum(log(diagonal(L)));
  }
}

data {
  int<lower=1> K;
  int<lower=1> N;
  real y[N];
}
parameters {
  ordered[K] mu;
  positive_ordered[K - 1] rho;
  real<lower=0, upper=1> sigma[K];
  simplex[K] lambda;
}
model {
  // Prior model
  mu ~ normal(0, 5);
  sigma ~ std_normal();
  lambda ~ dirichlet(rep_vector(3, K));
  rho ~ gamma(0.5, 1.0);
  mu ~ repulsive(rho);
  
  // Observational model
  for (n in 1:N) {
    real comp_lpdf[K];
    for (k in 1:K) {
      comp_lpdf[k] = log(lambda[k]) + normal_lpdf(y[n] | mu[k], sigma[k]);
    }
    target += log_sum_exp(comp_lpdf);  
  }
}

ldeschamps · September 10, 2020, 6:57pm

Thank you very much! Repulsive priors seem great. However, isn’t still necessary to use ordered constraints on means? mu is ordered in your code.

spinkney · September 10, 2020, 7:21pm

Ordering will help the label switching and the repulsive prior will help from collapsing modes on top of each other. In other words, you don’t need ordering for the repulsive to work, it’s just not going to help much or at all with label switching.

betanalpha · September 16, 2020, 1:12am

To be clear in exchangeable mixture models, where all of the components are equivalent and hence there’s a fundamental ambiguity in what component will model which part of the data generating process , are problematic. Non-exchangeable mixture models where the form of each component or the prior assigned to each component breaks the ambiguity can be quite powerful in practice when that breaking is based on domain expertise about the structure of the data generating process. See for example zero-inflated models and their ilk.

To be fair ordering technically breaks the formal exchangeability of a mixture model. The problem is that for any finite data set there will still be many model configurations that explain the observed data well enough that the posterior will be highly degenerate. By far the most successful approach is to stop trying to cluster and start understanding the various overlapping contributions to your data generating process and then modeling each of those one at a time in preparation for a non-exchangeable mixture model.

ldeschamps · September 16, 2020, 12:03pm

That is the most pertinent reflexion I have read on the subject! And it forces to think deeply about our data, which one would normally do to produce interesting and valid scientific inference!

In my case, I do suspect there should be a “generative” logic behind how my data points can be clustered, and I am not sure an exchangeable mixture model would produce anything that interpretable.

Edit : I think my attraction toward exchangeable mixture models arose because I guess (theoretically and empirically) there is clusters, but at the sampling time, we sampled for a different question and we do not have the data to cluster observations genetically. So at first, exchangeable mixture models seemed to be a way to compensate, but I realized one step at a time this idea was an illusion. Hard to answer questions with data harvested to answer another one!

Thanks!

betanalpha · September 17, 2020, 7:00pm

👏👏👏👏👏

Topic		Replies	Views
Identification for mixture model or HMM with covariate(s)? Modeling hmm , mixture	4	857	June 2, 2020
Identification of mixture of multivariate normal distributions Modeling fitting-issues	5	2500	June 11, 2018
Enforcing order for mixture of transition matrices Modeling specification , mixture	2	484	June 24, 2021
Order the simplex Modeling	2	1317	September 17, 2018
Does this work for dealing with non-identifiability due to permutation symmetry? Modeling	3	465	October 1, 2018

Other constraints than mean ordering to identify mixture model?

Related topics