Divergencies in case of expectation close to zero and left-censored log-normal distributed data

Hello Stanimals,

Short summary: I have a model that predicts for some subset of my data (assumed to be log-normally distributed) that the expectation is practically zero. Those data are left-censored, and I’m modelling this censoring process. However, during prior predictive checks, Stan reports divergent iterations when the expectation is too close to zero, even when no data or censoring-related sampling statements are included in the code yet.

Elaboration: I am developing a non-linear model for data that I assume have log-normal measurement error. The model is supposed to generate an S-shaped regression line for which I’m using Stan’s Phi_approx() as an approximation to the normal CDF. I use this because the mean and sd parameters have a specific biological meaning in the study context. Further, a subset of the data is left-censored, which I intend to explicitly model using either (1) the cumulative log normal distribution function or (2) through data augmentation by defining an additional parameter with an upper bound for every censored observation - yet to decide.

Now the issue I’m struggling with: while performing prior predictive checks, I ran into diverging iterations being reported (up to half of all samples!!!). I traced the culprit to a line of code where I calculate the natural logarithm of the output of Phi_approx(), which will later on (after finishing prior predictive checks) be used as the mean parameter for the log-normal distribution. It seems that some trajectories result in log(...) to underflow at some point, which is totally expected. I’m not too concerned about this however, as the underflow result of log(Phi_approx(...)) will be associated with left-censored data points (in which case the posterior density will be 1.0).

What I did not expect however, is Stan to report these underflows as diverging iterations, even when I’m not yet actually using the underflowed result of log(Phi_approx(...)) for anything. I checked that divergences weren’t caused by anything else; e.g. if I comment out the log() statement and only sample from the priors and calculate Phi_approx(...), Stan purrs like a kitten. However, with log(Phi_approx(...)) in action, divergence alarm bells go off while the results returned by Stan are exactly the same (means, BCIs, Rhat = 1.000). This has me a little concerned that if I continue to use this model and ignore the warnings about divergences, I may miss out on more serious divergences as I build up the model to its full level of complexity (mixed-effects non-linear model).

Of course, I could hack my way out of this by doing log(Phi_approx(...) + c) where c is a small constant, but that doesn’t seem right. Or is this (or another type of hack) generally accepted for these situations?

Attached is a minimal working example with a fixed effects version of the model and prior predictive checks that reproduces the divergencies, along with R code to run it with rstan.

Best,

Luc

P.s.: I did already read and enjoy an earlier extensive post on divergencies Getting the location + gradients of divergences (not of iteration starting points) and the viz paper [1709.01449] Visualization in Bayesian workflow, but couldn’t find an answer there.


functions {
  vector phi_approx(vector x,
                    real mu,
                    real sigma) {
    /**
     * @param x = random variates
     * @param mu = mean of normal distribution
     * @param sigma = standard deviation of normal distribution
     * @return An approximation to the normal CDF
     */
     
    return(Phi_approx((x - mu) / sigma));
    
  }
  
}

data {
  // Data
  int<lower=0> N;
  vector[N] time;
  
  // Prior parameters
  real mu_time_mu;
  real<lower=0> mu_time_sd;
  
  real<lower=0> sigma_time_mu;
  real<lower=0> sigma_time_sd;
  
}

transformed data{  
}


parameters {
  
  // Mid-point of S-shaped curve (mean of approximate normal CDF)
  real mu_time;
  
  // Curvature of S-shaped curve (sd of approximate normal CDF)
  real<lower=0> sigma_time;
     
}

transformed parameters {
}


model {
  vector[N] prpc;      // prior predictive check
  vector[N] log_prpc;  // log of prior predictive check
  
  prpc = phi_approx(time, mu_time, sigma_time);
  log_prpc = log(prpc);  // Cause of divergent iterations being reported
  
  // Prior distributions
  mu_time ~ normal(mu_time_mu, mu_time_sd);
  sigma_time ~ normal(sigma_time_mu, sigma_time_sd);
  
}

generated quantities {
  
  vector[N] prpc;
  vector[N] log_prpc;
  
  prpc = phi_approx(time, mu_time, sigma_time);
  log_prpc = log(prpc);  // does not cause divergencies to be reported

}

toy_reproduce_divergencies.stan (1.4 KB) toy_reproduce_divergencies.R (1.5 KB)

2 Likes

If the aim is to fit a CDF-esque curve, then maybe you could use a logistic function? The mean/midpoint of the logistic function will match the mean of the normal CDF, and the logistic’s growth rate is related to the standard deviation of the equivalent normal (exactly how I can’t recall). But of course, they are not the same model, so there will be a difference. The plus is that you can use log1p and expm1 to evaluate the logistic (on the log scale) in a numerically stable manner (see section 3). Also, can you add any extra detail as to why you need the log of Phi_approx? If the data are ‘CDF shaped’, then I’m not seeing why you need to log it in the first place?

Otherwise, the usual suggestion is to check that the priors make sense (are values that underflow actually plausible in your context?), and simulate fake data from the model to see if divergences exist in the posterior as well.

Hi hhau,

Thanks for sharing your thoughts. I’m already using a logistic approximation to the normal CDF (i.e. Phi_approx()) and the priors do make sense. The problem is really that the log() functions underflows, causing a divergent iterations to be reported. I need to take the log because the data (concentrations in blood) are log-normally distributed (I scale the CDF curve with a scalar to make it fit to the data). The priors are not the problem - I tested that. Using log1p() would indeed be a hack to avoid the the divergent iterations, but I don’t want to add 1.0 as that throws off the expectation for the log-normal distribution way off.

If the issues are coming from log(Phi_approx()), try using std_normal_lcdf() instead, since it has much better numerical stability. The only thing is that I don’t think its available in current rstan, so you’ll need to use cmdstanr

Thanks for the suggestion andrjohns. I think I might have oversimplified the context while trying to create a minimal working example. I’ll provide more background. The full model is described here. (the math is in the supplement); I wanted to spare you the math gore haha but here goes. I’m revisiting the model as I want to use for a new dataset that includes additional types of measurements. I’m doing this in a different version of Stan than the original, and the sheer number of diverging iterations is new to me (I experienced and dealt with some in the original model, but not at this scale).

The model predicts the blood concentration of a parasite that multiplies in highly synchronised cycles (see attached image from the paper referred to above). Each cycle is constructed using two normal CDFs; the first for the time of appearance of a generation of parasites and the second for time of disappearance of the same generation from the blood. The amplitude of first cycle is governed by a scalar \beta that is multiplied with the term (CDF_1 - CDF_2). The height of each consecutive cycle is a multiple R of the previous cycle’s height. I try to do as much as I can on the log scale:

log(concentration) = \log(\beta) + \log(\sum_{c=1}^{C} R^{c-1} * (CDF_{c,1} - CDF_{c,2}))

Here, c is the c-th wave of C cycles of parasite generations. Each set of (CDF_{c,1} - CDF_{c,2}) is shifted over time (i.e., different mean \mu_c for each), but has the same curvature (i.e., same standard deviation \sigma). I implement the CDFs using Phi_approx(), such that:

log(concentration) = \log(\beta) + \log(\sum_{c=1}^{C} R^{c-1} * (\frac{1}{1+e^{-z_{c,1}}} - \frac{1}{1+e^{-z_{c,2}}}))

Here, z is a trivial 3rd order polynomial function of time (horizontal axis in the graph), and the mean and standard deviation of the CDF (as explained in the stan manual). Currently, the second term \log(\sum_{c=1}^{C}(...)) causes underflow/divergencies.

I hope that clarifies the challenge I’m facing. As such, using std_normal_lcdf() wouldn’t work (even not in cmdstan) as need to take the log after some manipulations of the CDFs. Any suggestions on how to program this in a more numerically stable manner are most welcome! I’m already using Phi_approx() for the CDFs. Maybe I’m missing @hhau’s previous point about somehow using log1p() and expm1() for this - I don’t yet see how to go about this.

I think my question boils down to “how to perform a numerically stable calculation of the following in Stan”?

\log(\frac{1}{1+e^{-z_1}} - \frac{1}{1+e^{-z_2}} + \frac{R}{1+e^{-z_3}} - \frac{R}{1+e^{-z_4}} + \frac{R^2}{1+e^{-z_5}} - \frac{R^2}{1+e^{-z_6}})

where

z_i = 0.07056x_i^3+1.5976x_i

and
x_i=\frac{time - \mu_i}{\sigma}

time is data and R, \mu_i, and \sigma are model parameters (all strictly positive)

We have

\frac{1}{1+e^{-z_{1}}}-\frac{1}{1+e^{-z_{2}}}=\frac{e^{-z_{2}}-e^{-z_{1}}}{\left(1+e^{-z_{1}}\right)\left(1+e^{-z_{2}}\right)}

so subject to constraint that z_1 < z_2, z_3 < z_4, and z_5 < z_6 this is stable

log_sum_exp({
             log_diff_exp(-z2, -z1) - log1p(-z1) - log1p(-z2),
    log(R) + log_diff_exp(-z4, -z3) - log1p(-z3) - log1p(-z4),
  2*log(R) + log_diff_exp(-z6, -z5) - log1p(-z5) - log1p(-z6)
  })
4 Likes

Thanks @nhuurre for working this out! I had great hopes, especially as the condition z_1<z_2<z_3< ...<z_C is satisfied because \mu_1<\mu_2<\mu_3< ...<\mu_C. Unfortunately, my implementation of your equations crashes and burns because z_i can be lower than -1. See:

z_i = 0.07056x_i^3+1.5976x_i (see attached plot)

where

x_i=\frac{time - \mu_i}{\sigma}

where time (time of observation) can be considerably lower than \mu_i (the time at which an upward or downward wave reaches it midpoint).

I hadn’t realised yet that log1p() won’t play with number <-1.0. I’ll chew on the derivation of your numerically stable equation a little more. Please do let me know if you think you see what’s going awry here.

Whoops, those should have been log1p_exp. Also, if z=\frac{\text{time}-\mu_i}{\sigma} then \mu_1<\mu_2 implies z_1>z_2 but I got the ordering constraint backwards too so it works out anyway.

3 Likes

I just came to exactly the same conclusion!

For poster(ior)ity and easy reference, the solution is:

log_sum_exp({
             log_diff_exp(-z2, -z1) - log1p_exp(-z1) - log1p_exp(-z2),
    log(R) + log_diff_exp(-z4, -z3) - log1p_exp(-z3) - log1p_exp(-z4),
  2*log(R) + log_diff_exp(-z6, -z5) - log1p_exp(-z5) - log1p_exp(-z6)
  })

Yay, thanks @nhuurre!

4 Likes