Forcing separation in location parameters for Gaussian mixture models

emiruz · September 23, 2019, 2:55pm

My painful but fruitful Bayesian mixture model saga continues :-)

Say I have a mixture model, involving K Gaussians. I have the problem that some or all of those Gaussians end up overlapping causing one of the non-identifiability conditions mentioned in Betancourt 2017.

I’ve spent some hours reading through the literature on repulsive priors which ought to solve this problem, however it’s pretty complicated and theoretical so far as I can tell. I’m looking for something more practical.

In stan, would be be possible to force a separation between the location of my Gaussians with a condition in the model that will reject samples if location values don’t meet a certain constraint? E.g. | \mu_i-\mu_j| > max\{\sigma_i^2,\sigma_j^2\} . Is there some smart way of achieving this?

Will it have the effect I’m hoping for? I.e. the separation of mixture model location parameters.

bbbales2 · September 24, 2019, 7:33pm

Introducing discontinuous behavior in the likelihood like this will give HMC problems when it’s near the discontinuity.

I’m guessing since you want these cutoffs, then the sampler would be near the discontinuities a bunch and that’d be no good.

There might be a way to do a soft separation, but I guess that’s what the repulsive prior papers are doing.

If this is 1D maybe you could parameterize the locations of the different means with offsets and make these offsets be lower-bounded by some positive number (or do a positive, zero avoiding prior)? That might not work well though either.

emiruz · September 24, 2019, 8:28pm

@bbbales2 thanks, do you know of a way of creating a joint prior for something like this? I.e. joint in such a way that it has natural separations? For example, if I made a join density \pi(\theta_1,\theta_2) \propto \frac{1}{d(\theta_1,\theta_2)} where d is some useful distance measure between the location parameters?

Also, do you think a Gibb’s sampler would be better suited to the discontinuities in the original idea?

bbbales2 · September 24, 2019, 9:20pm

I’m afraid I don’t know the answer to that. I’ve had similar problems before and not solved them too :D.

betanalpha · September 24, 2019, 11:41pm

Here’s an example implementation that uses a determinantal point process prior to induce an exchangeable repulsion between the components. The scaling of the prior is hard to tune and, well, it only reveals more problems. Even without label switching and component collapse you get horrendous posterior because of the degeneracies in the model for finite data. Ultimately exchangeable mixture models are not well suited for Bayesian inference. Once I finish all of my other case studies I’ll go back and update my mixture model case study with the whole progression of fixes and the deeper problems that reveal.

Exchangeable mixture models: not even once.

functions {
  real repulsive_lpdf(vector mu, real rho) {
    int K = num_elements(mu);
    matrix[K, K] S;
    matrix[K, K] L;
    real log_det = 0;

    for (k1 in 1:K)
      for (k2 in 1:K)
        S[k1, k2] = exp(- square(mu[k1] - mu[k2]) / square(rho));
    L = cholesky_decompose(S);

    for (k in 1:K)
      log_det = log_det + 2 * log(L[k, k]);

    return log_det;
  }
}

data {
  int<lower=0> K;
  int<lower = 0> N;
  real y[N];
}

transformed data {
  int K_excess = K;
}

parameters {
  ordered[K] mu;
  real<lower=0> sigma[K];
  simplex[K] lambda;
}

model {
  mu ~ normal(0, 5);
  mu ~ repulsive(5);
  sigma ~ normal(0, 1);
  lambda ~ dirichlet(rep_vector(3.0, K));
  for (n in 1:N) {
    vector[K] comp_lpdf;
    for (k in 1:K) {
      comp_lpdf[k] =  log(lambda[k])
                    + normal_lpdf(y[n] | mu[k], sigma[k]);
    }
    target += log_sum_exp(comp_lpdf);
  }
}

emiruz · September 25, 2019, 8:12am

Ultimately exchangeable mixture models are not well suited for Bayesian inference

I had hoped this wasn’t the case, and I wait with abated breath for your updatea on the mixture model write-up.

In my case I have a large data-set and an unknown K of Gaussian sub-populations. I can see a way through if I knew K. So if I just had some way of determining K, I could incrementally solve the rest.

Where a mixture model is the right choice, is there typically an alternative that would also be appropriate?

betanalpha · September 25, 2019, 2:12pm

I’m going to claim that even if K is known you won’t be able to fit the model due to the degeneracies. One way to think about this is in terms of experimental design – for any finite data set observations from an exchangeable mixture of Gaussians are very poorly informative of those latent components and the resulting posterior will be extremely degenerate. You either need very informative priors or additional observational processes that can break that degeneracy.

emiruz · September 25, 2019, 2:21pm

It feels like you’re implying the corollary that mixture models are only principled in situations where K is known and the priors are non-exchangeable (or otherwise have a very strong, asymmetric effect on the components of the mixture).

I’d hazard to guess that unknown K and non-exchangeable priors doesn’t really make sense, so it seems I’m going to have to look for alternative formulations for my particular puzzle.

betanalpha · September 25, 2019, 2:40pm

Basically. The same problems arise with Dirichlet processes and neural networks – the more universal the model is asymptotically the more degenerate it will be for finite data. It would be great if this weren’t true but I have yet to see any exceptions.

Topic		Replies	Views
Multimodality issues in regression model with mixture prior Modeling techniques , fitting-issues	4	1011	August 29, 2019
[help]Mixture priors problem Modeling priors	6	764	May 18, 2021
HMC for mixture models with varying scales General fitting-issues	6	92	January 27, 2025
Mixture of Gaussian distributions for the random effects Modeling	6	1690	August 28, 2017
Gaussian Mixture Modeling/LPA in Stan? Modeling	2	849	February 20, 2022

Forcing separation in location parameters for Gaussian mixture models

Related topics