Normal_lcdf underflows much faster than pnorm(log=TRUE)

mat_weldon · June 20, 2018, 11:30am

I’m trying to write a stable probit-based model, and I’ve noticed when using optimizing for rough exploration, that a lot of starting points are rejected. In the manual I found that normal_lcdf underflows to -Inf if the argument is less than -37.5. This is the same point at which R’s pnorm underflows to 0, but pnorm with log=TRUE gives an answer (presumably accurate) until the argument is less than -1 \times 10^{154}.

Phi_approx is suggested as a solution, but in fact appears to underflow even sooner, and appears to be unacceptably inaccurate on the log scale:

data{
  int N;
  real x[N];
}
parameters{
}
model{
}
generated quantities{
  real logphi_exact[N];
  real logphi_approx[N];
  for(i in 1:N){
     logphi_exact[i] = normal_lcdf(x[i]|0.0,1.0);
     logphi_approx[i] = log(Phi_approx(x[i]));
  }
}

sampling(phi_test,data=list(N=5,x=c(-1,-10,-20,-30,-40)),iter=1,chain=1,algorithm="Fix")

SAMPLING FOR MODEL 'stan-35456da26e9' NOW (CHAIN 1).
Iteration: 1 / 1 [100%]  (Sampling)

 Elapsed Time: 0 seconds (Warm-up)
               0.000139 seconds (Sampling)
               0.000139 seconds (Total)

Inference for Stan model: stan-35456da26e9.
1 chains, each with iter=1; warmup=0; thin=1; 
post-warmup draws per chain=1, total post-warmup draws=1.

                    mean se_mean sd    2.5%     25%     50%     75%   97.5% n_eff Rhat
logphi_exact[1]    -1.84      NA NA   -1.84   -1.84   -1.84   -1.84   -1.84     1  NaN
logphi_exact[2]   -53.23      NA NA  -53.23  -53.23  -53.23  -53.23  -53.23     1  NaN
logphi_exact[3]  -203.92      NA NA -203.92 -203.92 -203.92 -203.92 -203.92     1  NaN
logphi_exact[4]  -454.32      NA NA -454.32 -454.32 -454.32 -454.32 -454.32     1  NaN
logphi_exact[5]     -Inf      NA NA    -Inf    -Inf    -Inf    -Inf    -Inf     1  NaN
logphi_approx[1]   -1.84      NA NA   -1.84   -1.84   -1.84   -1.84   -1.84     1  NaN
logphi_approx[2]  -86.54      NA NA  -86.54  -86.54  -86.54  -86.54  -86.54     1  NaN
logphi_approx[3] -596.43      NA NA -596.43 -596.43 -596.43 -596.43 -596.43     1  NaN
logphi_approx[4]    -Inf      NA NA    -Inf    -Inf    -Inf    -Inf    -Inf     1  NaN
logphi_approx[5]    -Inf      NA NA    -Inf    -Inf    -Inf    -Inf    -Inf     1  NaN
lp__                0.00      NA NA    0.00    0.00    0.00    0.00    0.00     1  NaN

Here’s the output from R’s function

pnorm(c(-1,-10,-20,-30,-40),log=T)
[1]   -1.841022  -53.231285 -203.917155 -454.321244 -804.608442

I’m just wondering if there’s any way to improve this in Stan? Unless it doesn’t really matter for sampling, but I can’t help thinking it would introduce some bias somewhere, because if one observation gives -Inf that propagates to the entire log posterior evaluation. For one outlier the true contribution to the log posterior might only be \log\Phi(-38)=-727, which means that the entire log posterior might be very far from -Inf.

sakrejda · June 20, 2018, 1:09pm

Next steps if someone is interested in doing this: 1) use this function to sample from something I a simple model and show results are biased; 2) check what implementation R uses; 3) check what Stan uses; 4) characterize where in parameter space each one is more accurate; 5) propose some changes.

Most implementations only cover a region in these more challenging functions so I’m sure pnorm is stitched together from approximations. It probably has good clues about how to improve on Stan 's implementation.

mat_weldon · June 20, 2018, 2:20pm

Agreed. As for bias, admittedly it would probably be very difficult to simulate bias in a properly specified model, because the probability of seeing such an observation would be so small. In very big data perhaps. However, I think it should be possible to show that when there is an arbitrarily small misspecification you can produce an arbitrarily large bias.

Say you have a model where, for 99.99% of observations, p(y_i|\beta)=\Phi(\beta x_i) but for those observations with extreme negative values of x_i the linear relationship breaks down, and p(y_i|\beta) \rightarrow c \neq 0. Either that, or x is mismeasured at the negative extreme in a way that tends to exaggerate its magnitude. In this case, if they can individually set the entire log posterior to -Inf the 0.01% of observations effectively hold a veto over the parameter space, in that they dictate where the posterior has support. This small fraction of observations, or even one, could produce an arbitrarily large bias in estimates of \beta.

Unfortunately, editing C++ code would be beyond me, but a quick Google has turned up some promising avenues for approximation. The pnorm package itself references a couple of papers and Fortran routines by W. J. Cody and here but I’m not entirely convinced they’re used for the extremely small values of \log \Phi(\cdot), mainly because nowhere in the papers does it mention obtaining an approximation for the log function separately, but this seems to be what pnorm is doing.

Cross Validated has turned up this question and answers. In particular the second solution is very simple:

\log \Phi(x) \approx \mathrm{Pa}(x) = \log(\frac{1}{\sqrt{2\pi}}) - 0.5 x^3 - \log(-x).

Starting from \mathrm{Pa}(-37) it produces approximations that agree with pnorm (my only reference for accuracy) to 4 sig. figs.

sakrejda · June 20, 2018, 2:30pm

The c++ part of this is pretty small! Thanks for looking into it.

mat_weldon · June 20, 2018, 2:38pm

No problem, thanks. As an aside, based on what I’ve seen from calculating log(Phi_approx(x)) I now have doubts that the manual should suggest that it can be used instead of normal_lcdf for probit regression (P.448) without further checks. It might look ok on the original scale, but on the log scale it looks like a bad approximation.

Bob_Carpenter · July 20, 2018, 5:10pm

Sorry that was misleading. Phi_approx is just a inverse logit scaled to look like probit.

We encourage people to check every aspect of their fits! This stuff’s all way too numerically unstable to trust out of the box for complicated problems without validation. It also helps check the coding part in addition to the statistics.

There’s a lot of room for improvement in our _lcdf and _lccdf functions—they’re mostly just taking the log of the _cdf functions.

We’d love to have someone work on this.

If there are specific algorithms people can recommend for the normal log CDF, we could implement those.

Topic		Replies	Views
Normal_lcdf status General	8	1341	January 31, 2019
Underflow? in custom distribution function Modeling	5	705	June 12, 2018
Numerical Precision of normal_lcdf General	2	666	July 14, 2019
About Pystan Convergence Issue Modeling specification	6	1114	August 9, 2020
Divergencies in case of expectation close to zero and left-censored log-normal distributed data Modeling divergences	9	898	January 21, 2021

Normal_lcdf underflows much faster than pnorm(log=TRUE)

Related topics