Comparing two models: single gaussian and a mixture of two gaussians

@avehtari, thank you so much. I have changed the model as you suggested and that fixed the problem completely! Model comparison now strongly prefers the two-gaussian model, as it should (I did many other tests with synthetic data as well). WAIC and PSIS numbers are almost identical. Pareto K values are all below 0.5.

The new model also samples much faster, with no Stan warnings and higher effective number of samples, so the new model is a better solution to my divergent transitions question.

I’ve also checked that the fixed model produces distributions of model parameters that are almost identical to those from the previous model. So the old and new models are probably mathematically equivalent.

Here is the fixed versions of the models and the plots.

Model 1: single gaussian

// Single Gaussian population model
data {
  int<lower=1> n;        // Number of data points
  real y[n];             // Observed values
  real uncertainties[n]; // Uncertainties
  real sigma_stdev_prior;// Prior for standard deviation of population spread
}
parameters {
  real mu;               // Population location
  real<lower=0> sigma;   // Population spread
}
model {
  // Set priors
  sigma ~ normal(0, sigma_stdev_prior);
  mu ~ normal(0, 1);

  // Loop through observed values and calculate their probabilities
  // for a Gaussian distribution, accounting for uncertainty of each measurment
  for (i in 1:n) {
    target += normal_lpdf(y[i] | mu, sqrt(sigma^2 + uncertainties[i]^2));
  }
}
generated quantities{
  vector[n] lpd_pointwise;

  for (i in 1:n) {
    lpd_pointwise[i] = normal_lpdf(y[i] | mu, sqrt(sigma^2 + uncertainties[i]^2));
  }
}

Model 1: mixture of two gaussians

// Guassian mixture model
data {
  int<lower=1> n;        // Number of data points
  real y[n];             // Observed values
  real uncertainties[n]; // Uncertainties
  real sigma_stdev_prior;// Prior for standard deviation of population spread
}
parameters {
  real<lower=0,upper=1> r;   // Mixing proportion
  ordered[2] mu;             // Locations of mixture components
  real<lower=0> sigma;       // Spread of mixture components
}
model {
  // Set priors
  sigma ~ normal(0, sigma_stdev_prior);
  mu[1] ~ normal(0, 1);
  mu[2] ~ normal(0, 1);
  r ~ beta(2, 2);

  // Loop through observed values
  // and mix two Gaussian components accounting for uncertainty of each measurment
  for (i in 1:n) {
    target += log_mix(1 - r,
      normal_lpdf(y[i] | mu[1], sqrt(sigma^2 + uncertainties[i]^2)),
      normal_lpdf(y[i] | mu[2], sqrt(sigma^2 + uncertainties[i]^2)));
  }
}
generated quantities{
  vector[n] lpd_pointwise;

  for (i in 1:n) {
    lpd_pointwise[i] = log_mix(1 - r,
      normal_lpdf(y[i] | mu[1], sqrt(sigma^2 + uncertainties[i]^2)),
      normal_lpdf(y[i] | mu[2], sqrt(sigma^2 + uncertainties[i]^2)));
  }
}

Figure 10: Synthesised observations

Table 1: WAIC output

WAIC SE dWAIC dSE pWAIC Weight
Two populations -5.06 6.81 3.67 1.00
One population 91.18 4.28 96.24 5.37 3.82 0.00

Table 2: PSIS output

PSIS SE dPSIS dSE pPSIS MaxK Weight
Two populations -4.88 6.89 3.76 0.49 1.00
One population 91.20 4.28 96.08 5.43 3.83 0.19 0.00

Figure 11: Comparing models with WAIC

Figure 12: Comparing models with PSIS

Figure 13: Pareto K values

Full Code

The full code to compare the models and make the above plots is located here:

2 Likes