Gaussian mixture model with interval censored data -- where do the divergences come from?

ffinger · July 6, 2018, 7:03pm

I am fitting a simple mixture model to interval censored data. To do so I created a user-defined pmf based on a normal distribution which transforms the normal distributions to the discrete scale of my data.

functions {
  // Define log probability density function for an interval censored normal
  real intervalCensoredNormal_lpmf(int y, real mu, real sigma, vector breaks) {
    // This is a discrete log probability mass function defined for integers between 0 and rows(breaks).
    // Assigns the integral of the normal probability between breaks y and y+1 to y.
    // If y is 0 assigns integral of normal probability between -inf and break 1.
    // If y is rows(breaks) assigns integral of normal probability between the last break and +inf.

    real p;

    if (y == 0) { // from -inf to the first break
      p = normal_lcdf(breaks[1] | mu, sigma);
    } else if (y == rows(breaks)) { // from the last break to +inf
      p = normal_lccdf(breaks[rows(breaks)] | mu, sigma);
    } else { // between two breaks
      p = log_diff_exp(normal_lcdf(breaks[y+1] | mu, sigma), normal_lcdf(breaks[y] | mu, sigma));
    }
    
    return p;
  }
}

data {
 int<lower = 0> N;
 int y[N]; // an array with N elements
 int<lower = 0> K;
 vector[K] intervalBreaks; // a vector with K elements
}

parameters {
  ordered[2] mu; // this must be ordered (an ordered vector) so the chains do not constantly switch between the two distributions
  real<lower=0> sigma[2];
  real<lower=0, upper=1> theta;
}

model {
 mu[1] ~ normal(4, 4);
 mu[2] ~ normal(8, 4);
 sigma ~ normal(0, 5);
 theta ~ beta(1.5, 1.5);
 for (n in 1:N)
   target += log_mix(theta,
                     intervalCensoredNormal_lpmf(y[n] | mu[1], sigma[1], intervalBreaks),
                     intervalCensoredNormal_lpmf(y[n] | mu[2], sigma[2], intervalBreaks));
}

A figure of my data is attached. Note that the first and the last bar in the histogram actually comprise values that can be between (negative) infinity and the lower (upper) break of the bar, which I accounted for in the pmf. That’s why the first bar is so high.

Now the problem:

Everything runs fine and smoothly, but I do get over 100 divergences even with adapt_delta set to 0.9 or 0.99. Initially I wasn’t worried, since the divergences are quite well distributed in space when showing them with pairs from rstan (figure attached), so I didn’t expect them to introduce major bias. However, when looking at the same figure in shinystan (attached) all divergences are around the smaller mode, and I am afraid that this mode remains under-represented because of the divergences.

Any ideas why those divergences arise and if I can get rid of them?
If not, should I be worried that the fit is biased, based on the divergences shown in shinystan?

Thanks for your advice.

density.pdf (38.2 KB)
pairs.pdf (156.1 KB)
shinstan-bivariate(1).pdf (257.5 KB)

bbbales2 · July 7, 2018, 2:33pm

Since you only have 5 parameters, could you just post the pairplots with all five of them in there?

ffinger · July 8, 2018, 6:57pm

Sure, here it is.

Again, when I plot the same thing in shinystan the divergences are not shown at the same place, but all concentrated in one of the modes (where mu[1] is around 3).

Thanks. pairs

bbbales2 · July 9, 2018, 1:36pm

Cool beans. Unfortunately it doesn’t seem to be saying much :P. None of the parameters are wildly big or small.

But looking back at your data, have you considered doing some sort of zero-inflated model looking thing? Check page 197, 13.7. Zero-Inflated and Hurdle Models in the 2.17.0 manual. It looks more like the situation is that than a mixture of two normals. Maybe that’d work better.

Back to “I wasn’t worried, since the divergences are quite well distributed in space” – it’s kinda hard to get much out of where the divergences are when they’re all spread around unfortunately. Any divergences are still bad news bears. @martinmodrak has a neat example here Getting the location + gradients of divergences (not of iteration starting points) - #5 by martinmodrak, but that’s down the line.

martinmodrak · July 9, 2018, 2:14pm

That’s weird and never happened to me - please double check that you are using the same fit for ShinyStan and pairs. Otherwise it might make sense to file a bug report.

Generally divergences are a worry, IMHO the only way to make sure they are OK to ignore is to run Simulation based calibration for the model which is a lot of work and time - you might not want to go that way.

I second @bbbales2 that discretized normal seems to be a weird choice for the data, the data look very much like zero-inflated negative binomial.

ffinger · July 9, 2018, 3:50pm

Thanks, I will have a look at other possible models

Just to reference, it seems to be known that pairs and shinystan give different information about the location of divergences:
https://groups.google.com/forum/#!msg/stan-users/nFx6k2KhLSA/KX6z4qxnAQAJ

ffinger · July 9, 2018, 4:02pm

Just another stupid question: if I would implement the same model in JAGS, would it be likely to suffer from the same problems? Trying to identify if this is a Stan-specific problem or a problem related to my model and data.

bbbales2 · July 9, 2018, 5:01pm

The way to check this is simulate and fit data from your actual model and see if the results are sane.

The only indicator you had that something bad was happening were the divergences which come from HMC sampler (Stan uses a variant of an HMC sampler internally). JAGs doesn’t use HMC and so wouldn’t have those diagnostics, so it might look like it’s working or working better when actually it’s not.

ffinger · July 9, 2018, 7:59pm

I can reproduce a very similar behaviour with simulated data. So it has likely something to do with the interval censoring I guess.

ffinger · July 11, 2018, 3:46pm

I reformulated the model with a categorical distribution, which is much nicer and simpler. I also replaced normal_lcdf and normal_lccdf with Phi because they return -inf and 0 for values of (y-mu)/sigma outside (-32.75,8.25), which probably caused some divergences (Stan Manual p523 (version 2.17.0-1)). This got rid of all the divergences.

data {
  int<lower = 0> N;
  int y[N]; // an array with N elements
  int<lower = 0> K;
  vector[K] intervalBreaks; // a vector with K elements
}

parameters {
  ordered[2] mu; // this must be ordered (an ordered vector) so the chains do not constantly switch between the two distributions
  real<lower=0> sigma[2];
  real<lower=0, upper=1> theta;
}

transformed parameters {

  simplex[K+1] tau; //the probabilities associated to the categories 1:K+1 with breaks intervalBreaks associated to them

//using Phi (with scaled and translated x) here instead of normal_cdf because normal_cdf returns 1 for (y-mu)/sigma > 8.25 (which is a very low limit). Also has some advantages to use the actual probabilities instead of logs here (because log_diff_exponential returns nan if a==b). See Stan Manual p523 (version 2.17.0-1).
//if this does not work I could also use Phi_approx, which has even a larger support.


  for (k in 1:K+1){
    if (k == 1)
          tau[k] =      theta * Phi((intervalBreaks[1]-mu[1])/sigma[1])
                      + (1-theta) * Phi((intervalBreaks[1]-mu[2])/sigma[2]);
    else if (k > 1 && k <= K)
          tau[k] =      theta * Phi((intervalBreaks[k]-mu[1])/sigma[1])
                      + (1-theta) * Phi((intervalBreaks[k]-mu[2])/sigma[2])
                      - theta * Phi((intervalBreaks[k-1]-mu[1])/sigma[1])
                      - (1-theta) * Phi((intervalBreaks[k-1]-mu[2])/sigma[2]);
    else if (k == (K+1))
          tau[k] =      1
                      - theta * Phi((intervalBreaks[K]-mu[1])/sigma[1])
                      - (1-theta) * Phi((intervalBreaks[K]-mu[2])/sigma[2]);
  }

  //print(tau)

}

model {
  mu[1] ~ normal(4, 3);
  mu[2] ~ normal(8, 3);
  sigma ~ normal(0, 5);
  theta ~ beta(1.5, 1.5);

  y ~ categorical(tau);
  
}

The posterior does still have two modes (identifiability problem) which is possibly due to problems with model choice highlighted by you guys.

Topic		Replies	Views
A mixture Gaussian model with censoring data Modeling	5	480	August 5, 2019
Modelling dependent variable with some kind of rounding Modeling specification	7	786	September 6, 2018
Summing out discrete paramters in a mixture. How can I derive that it works for individual observations? Modeling specification	9	669	June 2, 2020
Divergence and Bad Start Values in Censored Mixture Model Modeling fitting-issues	2	713	April 14, 2019
Divergence in fitting a mixture of censored and truncated data Modeling specification , divergences , diagnostics	0	550	October 18, 2021

Gaussian mixture model with interval censored data -- where do the divergences come from?

Related topics