Normal_lcdf underflows much faster than pnorm(log=TRUE)


#1

I’m trying to write a stable probit-based model, and I’ve noticed when using optimizing for rough exploration, that a lot of starting points are rejected. In the manual I found that normal_lcdf underflows to -Inf if the argument is less than -37.5. This is the same point at which R’s pnorm underflows to 0, but pnorm with log=TRUE gives an answer (presumably accurate) until the argument is less than -1 \times 10^{154}.

Phi_approx is suggested as a solution, but in fact appears to underflow even sooner, and appears to be unacceptably inaccurate on the log scale:

data{
  int N;
  real x[N];
}
parameters{
}
model{
}
generated quantities{
  real logphi_exact[N];
  real logphi_approx[N];
  for(i in 1:N){
     logphi_exact[i] = normal_lcdf(x[i]|0.0,1.0);
     logphi_approx[i] = log(Phi_approx(x[i]));
  }
}
sampling(phi_test,data=list(N=5,x=c(-1,-10,-20,-30,-40)),iter=1,chain=1,algorithm="Fix")

SAMPLING FOR MODEL 'stan-35456da26e9' NOW (CHAIN 1).
Iteration: 1 / 1 [100%]  (Sampling)

 Elapsed Time: 0 seconds (Warm-up)
               0.000139 seconds (Sampling)
               0.000139 seconds (Total)

Inference for Stan model: stan-35456da26e9.
1 chains, each with iter=1; warmup=0; thin=1; 
post-warmup draws per chain=1, total post-warmup draws=1.

                    mean se_mean sd    2.5%     25%     50%     75%   97.5% n_eff Rhat
logphi_exact[1]    -1.84      NA NA   -1.84   -1.84   -1.84   -1.84   -1.84     1  NaN
logphi_exact[2]   -53.23      NA NA  -53.23  -53.23  -53.23  -53.23  -53.23     1  NaN
logphi_exact[3]  -203.92      NA NA -203.92 -203.92 -203.92 -203.92 -203.92     1  NaN
logphi_exact[4]  -454.32      NA NA -454.32 -454.32 -454.32 -454.32 -454.32     1  NaN
logphi_exact[5]     -Inf      NA NA    -Inf    -Inf    -Inf    -Inf    -Inf     1  NaN
logphi_approx[1]   -1.84      NA NA   -1.84   -1.84   -1.84   -1.84   -1.84     1  NaN
logphi_approx[2]  -86.54      NA NA  -86.54  -86.54  -86.54  -86.54  -86.54     1  NaN
logphi_approx[3] -596.43      NA NA -596.43 -596.43 -596.43 -596.43 -596.43     1  NaN
logphi_approx[4]    -Inf      NA NA    -Inf    -Inf    -Inf    -Inf    -Inf     1  NaN
logphi_approx[5]    -Inf      NA NA    -Inf    -Inf    -Inf    -Inf    -Inf     1  NaN
lp__                0.00      NA NA    0.00    0.00    0.00    0.00    0.00     1  NaN

Here’s the output from R’s function

pnorm(c(-1,-10,-20,-30,-40),log=T)
[1]   -1.841022  -53.231285 -203.917155 -454.321244 -804.608442

I’m just wondering if there’s any way to improve this in Stan? Unless it doesn’t really matter for sampling, but I can’t help thinking it would introduce some bias somewhere, because if one observation gives -Inf that propagates to the entire log posterior evaluation. For one outlier the true contribution to the log posterior might only be \log\Phi(-38)=-727, which means that the entire log posterior might be very far from -Inf.


#2

Next steps if someone is interested in doing this: 1) use this function to sample from something I a simple model and show results are biased; 2) check what implementation R uses; 3) check what Stan uses; 4) characterize where in parameter space each one is more accurate; 5) propose some changes.

Most implementations only cover a region in these more challenging functions so I’m sure pnorm is stitched together from approximations. It probably has good clues about how to improve on Stan 's implementation.


#3

Agreed. As for bias, admittedly it would probably be very difficult to simulate bias in a properly specified model, because the probability of seeing such an observation would be so small. In very big data perhaps. However, I think it should be possible to show that when there is an arbitrarily small misspecification you can produce an arbitrarily large bias.

Say you have a model where, for 99.99% of observations, p(y_i|\beta)=\Phi(\beta x_i) but for those observations with extreme negative values of x_i the linear relationship breaks down, and p(y_i|\beta) \rightarrow c \neq 0. Either that, or x is mismeasured at the negative extreme in a way that tends to exaggerate its magnitude. In this case, if they can individually set the entire log posterior to -Inf the 0.01% of observations effectively hold a veto over the parameter space, in that they dictate where the posterior has support. This small fraction of observations, or even one, could produce an arbitrarily large bias in estimates of \beta.

Unfortunately, editing C++ code would be beyond me, but a quick Google has turned up some promising avenues for approximation. The pnorm package itself references a couple of papers and Fortran routines by W. J. Cody and here but I’m not entirely convinced they’re used for the extremely small values of \log \Phi(\cdot), mainly because nowhere in the papers does it mention obtaining an approximation for the log function separately, but this seems to be what pnorm is doing.

Cross Validated has turned up this question and answers. In particular the second solution is very simple:

\log \Phi(x) \approx \mathrm{Pa}(x) = \log(\frac{1}{\sqrt{2\pi}}) - 0.5 x^3 - \log(-x).

Starting from \mathrm{Pa}(-37) it produces approximations that agree with pnorm (my only reference for accuracy) to 4 sig. figs.


#4

The c++ part of this is pretty small! Thanks for looking into it.


#5

No problem, thanks. As an aside, based on what I’ve seen from calculating log(Phi_approx(x)) I now have doubts that the manual should suggest that it can be used instead of normal_lcdf for probit regression (P.448) without further checks. It might look ok on the original scale, but on the log scale it looks like a bad approximation.


#6

Sorry that was misleading. Phi_approx is just a inverse logit scaled to look like probit.

We encourage people to check every aspect of their fits! This stuff’s all way too numerically unstable to trust out of the box for complicated problems without validation. It also helps check the coding part in addition to the statistics.

There’s a lot of room for improvement in our _lcdf and _lccdf functions—they’re mostly just taking the log of the _cdf functions.

We’d love to have someone work on this.

If there are specific algorithms people can recommend for the normal log CDF, we could implement those.