Underflow? in custom distribution function

KariOlson · June 6, 2018, 8:18pm

I am working on a model where data is an integer (and either is a sample of the unknown distribution or censored observation). Because the integer can be negative and might have much less variance than a poisson or negative binomial, I decided to write a distribution. I thought I found a nice solution using a continuous distribution and “rounding” but my implementation returns nan in the tails and I’m having trouble correcting this while maintaining differentiability.

For example:

function{
   real roundNormal_lpmf(int x, real mu, real sigma){
      return log_diff_exp(normal_lcdf(x+0.5 | mu, sigma), normal_lcdf(x-0.5 | mu, sigma));
   }
}

This returns nan with x=6, mu=0.1, sigma = 0.5. but is fine for x=4 (returns -25.9764).
I’ve tried to use an is_nan() in the function and substituting some number close to neg_inf but then I get:

Rejecting initial value:
Gradient evaluated at the initial value is not finite.
Stan can’t start sampling from this initial value.

Any ideas?

Bob_Carpenter · June 7, 2018, 11:49pm

The problem is in our lcdf and lccdf functions, which merely take logs of the cdf functions in most cases. This isn’t very stable. It might actually be better to do this:

log(normal_cdf(x + 0.5 | mu, sigma) - normal_cdf(x - 0.5 | mu, sigma)

but you’ll run into problems as soon as those normal_cdf both round to 0 or 1.

Otherwise, you might try the logistic distribution, which has a much more stable inverse CDF through log_inv_logit. It’s also built into Phi_approx.

We’re doing some maintenance on stability in a relevant part of the math lib. See this issue on inv_logit to avoid underflow which could be implemented in a user-defined function before the next release.

KariOlson · June 11, 2018, 7:51pm

I didn’t find the logistic distribution to be any better but I have resolved the issue by recognizing that on the tails, I can used the midpoint for the area (because the width is always 1).

real roundNormal_lpmf(int x, real, mu, real sigma){
   real z;
   z = (x-mu) / sigma;
   if ((z < -3)  || (z > 3)) {
      return normal_lpdf(x*1.0 | mu, sigma);
   } else {
      return log_diff_exp(normal_lcdf(x+0.5 | mu, sigma), normal_lcdf(x-0.5 | mu, sigma));
   }

Bob_Carpenter · June 12, 2018, 2:36am

Not if you use the cdf functions, but did you try the lcdf functions and working on the log scale?

KariOlson · June 12, 2018, 1:23pm

I tried

log_diff_exp(logistic_lcdf(x+0.5|mu,sigma), logistic_lcdf(x-0.5|mu,sigma))

Do you mean something different?

Bob_Carpenter · June 12, 2018, 9:50pm

That’s what I meant. I would’ve expected it to be more stable.

Topic		Replies	Views
Normal_lcdf status General	8	1363	January 31, 2019
Normal_lcdf underflows much faster than pnorm(log=TRUE) General	5	1069	July 20, 2018
Defining a custom lpdf for the Normal-Laplace convoluted density in Stan General	15	1074	January 24, 2020
Divergencies in case of expectation close to zero and left-censored log-normal distributed data Modeling divergences	9	953	January 21, 2021
Numerical Precision of normal_lcdf General	2	676	July 14, 2019

Underflow? in custom distribution function

Related topics