Normal_lcdf status


#1

Hi all,

I am enjoying modeling a mixture model with stan. For a mixture component, I use normal_lcdf and it underflows and overflows faster than pnorm of R, and I tested that phi() or phi_approx() are not better (as discussed in Normal_lcdf underflows much faster than pnorm(log=TRUE)

I wonder if someone is in a process of improving cumulative normal distribution in stan to have more precision and stability. Is there a stan code in github that I can refer to? Otherwise I will translate normal cdf code of other languages into stan.


#2

If x is too small for normal_lcdf, you might try log1m(Phi(-x)).


#3

You mean for large x?

Both Phi() and normal_cdf () overflow at 8.25 and underflow at -37.5.

So I already tried using log1m(Phi(-x)) or log1m(normal_cdf(-x)) when x > 0.

But I wonder if there is more stable implementation, because it will have an error when x > 37.5 anyway. (I have a mixture model, so x can be a very small negative value or a very large positive value.)


#4

If it’s not working for you a required first step would be to demonstrate where. Do you have any interests in doing a comparison to a preferred implementation? It’s good to check values and gradients since either can cause problems. I’ve done this for a few of our special functions and it’s not too bad math-wise


#5

In my data, subject i is from one of two distributions.
With probability p_normal[i], a likelihood of subject i’s value Y_obs[i]
is calculated from normal distribution (expected value is from regression Y = constant + coeff*covariate and standard deviation is fitted).
With probability 1-p_normal[i], a likelihood is calculated from geometric distribution (most of these individual have Y_obs = 0, and p for geometric distribution is fitted).
p_normal for each subject is known.

I simulated latent Y (Y_latent) with parameters below, the histogram is as below.

n_subj <- 200
n_obs <- 200 (now one data per subject)

n_cov <- 4 (number of covariates)
coeff <- c(1.2, 1.5, -0.9, 0.8) (coefficients)
constant = 4

And then to simulate observed Y, errors are applied (sd for normal distribution = 0.4, p for geometric distribution = 0.97)
and the values are discretized using floor(Y_latent) and truncated at 0,
so the histogram of Y_obs is as below.

To calculate likelihood from discretized normal distribution,
I use log_diff_exp(normal_lcdf(Y_obs + 1 | Y_expected, sigma), normal_lcdf(Y_obs | Y_expected, sigma)).
Also to extend (y-mu)/sigma from 8.25 to 37.5,
I use log1m(normal_cdf(-Y_obs, -Y_expected, sigma)) when (Y_obs - Y_expected) is positive.

functions {

vector geometric_lp(int n, int[] x, real p) {
vector[n] lp;

for(i in 1:n) {
  lp[i] = x[i] * log1m(p) + log(p);
}

return lp;

}

real log_normal_truncated(int Y_obs, real Y_reg, real sigma){
real lp;

if (Y_obs - Y_reg > 0){
  lp = log_diff_exp(0, normal_lcdf( -Y_obs | -Y_reg, sigma));
}
else {
  lp = normal_lcdf( Y_obs | Y_reg, sigma);
}
return (lp);

}

real log_normal_discretized(int Y_obs, real Y_reg, real sigma){
real lp;

if (Y_obs - Y_reg > 1){
  lp = log_diff_exp(normal_lcdf( -Y_obs     | -1.0*Y_reg, sigma),
                    normal_lcdf( -Y_obs - 1 | -1.0*Y_reg, sigma));
}
else {
  lp = log_diff_exp(normal_lcdf( Y_obs + 1 | Y_reg, sigma),
                    normal_lcdf( Y_obs     | Y_reg, sigma));
}
return (lp);     

}

vector discretized_normal_lp(int n, int[] Y_obs, vector Y_reg, real sigma) {
vector[n] lp;

for(i in 1:n) {

  if(Y_obs[i] == 0){
    lp[i] = log_normal_truncated(1, Y_reg[i], sigma);
  }else{
    lp[i] = log_normal_discretized(Y_obs[i], Y_reg[i], sigma);
  }

}
return lp;

}
}

data {
int n_subj;
int n_obs;

int subj_obs[n_obs]; //indices to map subjects to observation
int Y_obs[n_obs]; //observations

int subj_start_ind[n_subj]; //first indices of subjects
int subj_end_ind[n_subj]; //last indices of subjects

int n_cov; //number of covariates

matrix[n_subj, n_cov] cov; //covariates

vector[n_subj] p_normal; //probability that a subject is from normal distribution
}

parameters {
vector[n_cov] coeff;
real intercept;

real<lower=0> sigma;

real<lower=0.8, upper=1> p_geo;

}

transformed parameters {
vector[n_obs] Y_reg = rep_vector(intercept, n_obs) +
cov[subj_obs] * coeff;
}

model {
vector[n_obs] lp_zero;
vector[n_obs] lp_reg;

intercept ~ normal(0, 5);
coeff ~ normal(0, 5);

sigma ~ normal(0, 5);

p_geo ~ uniform(0.8, 1.0);

lp_zero = geometric_lp(n_obs, Y_obs, p_geo);
lp_reg = discretized_normal_lp(n_obs, Y_obs, Y_reg, sigma);

for(i in 1:n_subj) {
target += log_mix(
p_normal[i],
sum(lp_reg[subj_start_ind[i]:subj_end_ind[i]]),
sum(lp_zero[subj_start_ind[i]:subj_end_ind[i]])
);
}

}

After warm-up, I get 87 divergent transitions. These are false positive due to limitation of normal_lcdf.
I checked many times by varying the range of (y-mu)/sigma.

Even though my actual data rarely gets divergent transitions (because it has larger standard deviation),
I was going to implement normal_lcdf of my own using boost’s erf for stability.
And I found that stan is already using it.
First, I want to know why stan developers had to set ranges between -37.5 and 8.25.
Also I want to hear more suggestions (Any suggestions besides increasing adapt-delta).


#6

You could try modifying Phi to use long double internally and / or try the implementation from
https://www.jstatsoft.org/article/view/v011i04


#7

Thanks!

I just changed phi.hpp to have long double in my phi_ld.hpp file

#ifndef STAN_MATH_PRIM_SCAL_FUN_PHI_HPP
#define STAN_MATH_PRIM_SCAL_FUN_PHI_HPP

#include <boost/math/special_functions/erf.hpp>
#include <stan/math/prim/scal/fun/constants.hpp>
#include <stan/math/prim/scal/err/check_not_nan.hpp>

namespace stan {
namespace math {

inline long double Phi_ld(long double x) {
check_not_nan(“Phi”, “x”, x);
if (x < -37.5)
return 0;
else if (x < -5.0)
return 0.5 * boost::math::erfc(-INV_SQRT_2 * x);
else if (x > 8.25)
return 1;
else
return 0.5 * (1.0 + boost::math::erf(INV_SQRT_2 * x));
}

} // namespace math
} // namespace stan
#endif

and my stan model test_custom_phi.stan is

functions{
real Phi_ld(real x);
}
data{
int N;
real x[N];
}
parameters{
}
model{
}
generated quantities{

real lp[N];

for(i in 1:N){

lp[i] = Phi_ld(x[i]);

}
}

and ran

phi_test = stan_model(‘test_custom_phi.stan’,
allow_undefined = TRUE,
includes = paste0(’\n#include “’,
file.path(getwd(), ‘phi_ld.hpp’), '”\n’))

and I got compile error:

Error in compileCode(f, code, language = language, verbose = verbose) :
Compilation ERROR, function(s)/method(s) not created! In file included from C:/Users/user-1/Documents/R/win-library/3.4/BH/include/boost/config.hpp:39:0,
from C:/Users/user-1/Documents/R/win-library/3.4/BH/include/boost/math/tools/config.hpp:13,
from C:/Users/user-1/Documents/R/win-library/3.4/StanHeaders/include/stan/math/rev/core/var.hpp:7,
from C:/Users/user-1/Documents/R/win-library/3.4/StanHeaders/include/stan/math/rev/core/gevv_vvv_vari.hpp:5,
from C:/Users/user-1/Documents/R/win-library/3.4/StanHeaders/include/stan/math/rev/core.hpp:12,
from C:/Users/user-1/Documents/R/win-library/3.4/StanHeaders/include/stan/math/rev/mat.hpp:4,
from C:/Users/user-1/Documents/R/win-library/3.4/StanHeaders/include/stan/math.hpp:4,
from C:/Users/user-1/Documents/R/win-library/3.4/StanHeaders/include/src/stan/model/model_header.hpp:4,
from file28cc6e46312


#8

You need to call stan_model with verbose = TRUE to see the real error message these days, which was that you forgot to add the std::ostream* pstream__ argument in the C++ function. Also, you may need to implement the stan::math::var version even though it is not being called.


#9

What do you mean I need to implement the stan::math::var version?
How can I do this?