# Underflow? in custom distribution function

I am working on a model where data is an integer (and either is a sample of the unknown distribution or censored observation). Because the integer can be negative and might have much less variance than a poisson or negative binomial, I decided to write a distribution. I thought I found a nice solution using a continuous distribution and “rounding” but my implementation returns nan in the tails and I’m having trouble correcting this while maintaining differentiability.

For example:

``````function{
real roundNormal_lpmf(int x, real mu, real sigma){
return log_diff_exp(normal_lcdf(x+0.5 | mu, sigma), normal_lcdf(x-0.5 | mu, sigma));
}
}
``````

This returns nan with x=6, mu=0.1, sigma = 0.5. but is fine for x=4 (returns -25.9764).
I’ve tried to use an is_nan() in the function and substituting some number close to neg_inf but then I get:

Rejecting initial value:
Gradient evaluated at the initial value is not finite.
Stan can’t start sampling from this initial value.

Any ideas?

The problem is in our `lcdf` and `lccdf` functions, which merely take logs of the `cdf` functions in most cases. This isn’t very stable. It might actually be better to do this:

``````log(normal_cdf(x + 0.5 | mu, sigma) - normal_cdf(x - 0.5 | mu, sigma)
``````

but you’ll run into problems as soon as those `normal_cdf` both round to 0 or 1.

Otherwise, you might try the logistic distribution, which has a much more stable inverse CDF through `log_inv_logit`. It’s also built into `Phi_approx`.

We’re doing some maintenance on stability in a relevant part of the math lib. See this issue on inv_logit to avoid underflow which could be implemented in a user-defined function before the next release.

I didn’t find the logistic distribution to be any better but I have resolved the issue by recognizing that on the tails, I can used the midpoint for the area (because the width is always 1).

``````real roundNormal_lpmf(int x, real, mu, real sigma){
real z;
z = (x-mu) / sigma;
if ((z < -3)  || (z > 3)) {
return normal_lpdf(x*1.0 | mu, sigma);
} else {
return log_diff_exp(normal_lcdf(x+0.5 | mu, sigma), normal_lcdf(x-0.5 | mu, sigma));
}
``````
1 Like

Not if you use the cdf functions, but did you try the lcdf functions and working on the log scale?

I tried

``````log_diff_exp(logistic_lcdf(x+0.5|mu,sigma), logistic_lcdf(x-0.5|mu,sigma))
``````

Do you mean something different?

That’s what I meant. I would’ve expected it to be more stable.