# Difference in behavior between integrate_1d and integrate

I have an issue computing the integral of a function in rstan.

With “integrate_1d”, the integral cannot be computed over an interval, whereby it throws up “error estimate of integral above zero exceeds the given relative tolerance times norm of integral above zero” .

However, if I do the integration with “integrate” in R, then I have no such issue over that interval.

I am looking for

1. why the two functions behave differently
2. how I can get “integrate_1d” to behave more like “integrate”, OR
3. how I can use the function “integrate” in my rstan code.

I paste some code below to reproduce the issue. The density function correspond to a dynamic model of choice (Linear Ballistic Accumulator model), and the integral computes the probability that one value is more than the other at a given time.

I also attach a file with the code for convenience.forum question code.R (3.7 KB)

stancode <- 'functions{

real lbaX_pdf(real X, real t, real A, real v, real s){
//PDF of the LBA model

real b_A_tv_ts;
real b_tv_ts;
real term_1;
real term_2;
real pdf;

b_A_tv_ts = (X - A - t*v)/(t*s);
b_tv_ts = (X - t*v)/(t*s);
term_1 = Phi(b_A_tv_ts);
term_2 = Phi(b_tv_ts);
pdf = (1/A)*(-term_1 + term_2);

return pdf;
}

real lbaX_cdf(real X, real t, real A, real v, real s){
//CDF of the LBA model

real b_A_tv;
real b_tv;
real ts;
real term_1;
real term_2;
real term_3;
real term_4;
real cdf;

b_A_tv = X - A - t*v;
b_tv = X - t*v;
ts = t*s;
term_1 = b_A_tv * Phi(b_A_tv/ts);
term_2 = b_tv   * Phi(b_tv/ts);
term_3 = ts     * exp(normal_lpdf(b_A_tv/ts|0,1));
term_4 = ts     * exp(normal_lpdf(b_tv/ts|0,1));
cdf = (1/A)*(- term_1 + term_2 - term_3 + term_4);

return cdf;

}

// proba that 1 is ranked before 2 in first race

real rank_density(real x,          // Function argument
real xc,         // Complement of function argument
//  on the domain (defined later)
real[] theta,    // parameters
real[] x_r,      // data (real)
int[] x_i) {     // data (integer)

real t = theta[1];
real A = theta[2];
real v1 = theta[3];
real v2 = theta[4];
real s = theta[5];
real v;

v=lbaX_pdf(x, t, A, v1, s)*lbaX_cdf(x, t, A, v2, s);

return v;
}

// cumulative proba of 1 before 2 in first race, knowing neither reached b

real order(real down, real up, real[] theta, data real[] x_r) {
int x_i[0];
real v;

v=integrate_1d(rank_density, down, up, theta, x_r, x_i,1e-8);
// }
return v;
}
}
'

library(rstan)
expose_stan_functions(stanc(model_code = stancode))

t<-0.5
b<-1
A<-0.5
v1<-1
v2<-1
s<-1

# This compute the probability that 1 and 2 are less than b but 1 is more than 2 at time t

order(-10,b,theta = c(t,A,v1,v2,s), x_r = double())

order(-10,0.66,theta = c(t,A,v1,v2,s), x_r = double())

order(-10,0.67,theta = c(t,A,v1,v2,s), x_r = double())

order(-10,0.70,theta = c(t,A,v1,v2,s), x_r = double())

order(-10,0.76,theta = c(t,A,v1,v2,s), x_r = double())

order(-10,0.77,theta = c(t,A,v1,v2,s), x_r = double())

# gives out following error, which occurs for x between 0.67 and 0.76
# Exception: integrate: error estimate of integral above zero 9.59331e-10
# exceeds the given relative tolerance times norm of integral above zero
# (in 'unknown file name' at line 74)

# However, if I now use the function integrate in R, I do not have any problems in that domain:

f_1_2<-function(x,t,A,v1,v2,s){
p<-lbaX_pdf(x, t, A, v1, s)*lbaX_cdf(x, t, A, v2, s)
return(p)}

integrate(Vectorize(f_1_2),-10,0.66,t=t,A=A,v1=v1,v2=v2,s=s)$value integrate(Vectorize(f_1_2),-10,0.67,t=t,A=A,v1=v1,v2=v2,s=s)$value

integrate(Vectorize(f_1_2),-10,0.70,t=t,A=A,v1=v1,v2=v2,s=s)$value integrate(Vectorize(f_1_2),-10,0.76,t=t,A=A,v1=v1,v2=v2,s=s)$value

integrate(Vectorize(f_1_2),-10,0.77,t=t,A=A,v1=v1,v2=v2,s=s)$value # here is the whole graph of the function "order" between -1 and 3: N<-400 x<-(1:N)/100-1 ft<-rep(NA,N) for (i in (1:N)) {ft[i]<-integrate(Vectorize(f_1_2),-10,x[i],t=t,A=A,v1=v1,v2=v2,s=s)$value}
plot(x,ft)

1 Like

Yeah, integrate_1d can be rather frustrating.

The integrate_1d function is basically a wrapper around the Boost 1d quadrature stuff over here: https://www.boost.org/doc/libs/1_74_0/libs/math/doc/html/quadrature.html .

On top of doing the integrals, it will also compute gradients of these integrals which can blow up. But I don’t think it does that for expose_stan_functions.

To understand the error, check out: https://www.boost.org/doc/libs/1_74_0/libs/math/doc/html/math_toolkit/double_exponential/de_tol.html

I’m not sure what algorithm R’s integrate uses.

Looking at the function, Phi can be a rather fragile function to work with. I wonder if that is the thing blowing up. Stan and R use different implementations of functions like this. Numeric overflows/underflows mess up the integrator. I guess the place to start would be checking if either the pdf or cdf in your integral are failing to evaluate at any point.

Hopefully it’s one of those functions messing up so we can rewrite the integral and it will work (like what happened here: Using integration to fit the difference of gamma distributed random variable)

1 Like

Hi, thanks, I checked already the pdf and cdf and they are fine in the domain where the integral fails to evaluate.

I noticed that “integrate” does not rely on the same algorithm for integration as “integrate_1d”. When looking into the code for “integrate”, it call on “C_call_dqags”, which apparently is a C++ procedure in the QUADPACK library.

The question then is whether there is a way to get that C++ function to work in rstan. I have been looking into how to do this, but I must say this is a bit beyond me at the moment.

Do you have any luck turning down the tolerance for integrate_1d?

No, that does not work either.

Ugh, that’s no good. What is the value of the function at one of the spots where the integration fails? (and the error message at that spot too – I’m wondering how close the integrator got)

Can you use the ODE integrator to do the integral?

Can you post a plot of rank_density for a set of parameters where integrate_1d fails?

The code I already posted will generate the error messages. For reference, I get:

> order(-10,0.67,theta = c(t,A,v1,v2,s), x_r = double())
Error in order(-10, 0.67, theta = c(t, A, v1, v2, s), x_r = double()) :
Exception: integrate: error estimate of integral above zero 9.59331e-10 exceeds the given relative tolerance times norm of integral above zero  (in 'unknown file name' at line 150)
> integrate(Vectorize(f_1_2),-10,0.67,t=t,A=A,v1=v1,v2=v2,s=s)\$value
[1] 0.09634799


Here is some code with graphs in addition to the one I posted, if you can look into it, see if you can find an issue. The last one is a plot of rank_density.

N<-1000
x<-(1:N)/1000
ft<-rep(NA,N)
for (i in (1:N)) {ft[i]<-lba_cdf(x[i], b, A, v1, s)}
plot(x,ft)

N<-8000
x<-(1:N)/1000-1
ft<-rep(NA,N)
for (i in (1:N)) {ft[i]<-lbaX_pdf(x[i], t, A, v1, s)}
plot(x,ft)

N<-3000
x<-(1:N)/1000-1
ft<-rep(NA,N)
for (i in (1:N)) {ft[i]<-lbaX_pdf(x[i], t, A, v1, s)*lbaX_cdf(x[i], t, A, v1, s)}
plot(x,ft)


I do not know how to use the ODE integrator for such things, isn’t it meant for ordinary differential equations?

Thanks for the code. I’ll have a look.

The ODE integrator computes y(t) given y' = f(t, y) and y(t0).

If your f is not a function of y, then the solution of the ODE is the integral \int f(t) dt.

1 Like

I played around with the code a bit.

It looks to me like it’s able to do your integral [-10, 1] but then it won’t do your integral on [-10, 0.67]. Yeah that seems crazy to me. Is that a fair summary of the problem?

Yes, that is why I am asking for pointers why this is so here!

I loosened up the tolerances a bit and the function seems to work for me now:

v = integrate_1d(rank_density, down, up, theta, x_r, x_i, 1e-6);


I’m not sure how well it’s working though. I’m wondering if there isn’t something screwed up in how we’re interpreting the error checks from Boost.

Here’s some code for using the ODE integrators to do this:

real[] rank_density_ode(real x,
real[] y,          // Function argument
//  on the domain (defined later)
real[] theta,    // parameters
real[] x_r,      // data (real)
int[] x_i) {     // data (integer)

real t = theta[1];
real A = theta[2];
real v1 = theta[3];
real v2 = theta[4];
real s = theta[5];
real v;

v=lbaX_pdf(x, t, A, v1, s)*lbaX_cdf(x, t, A, v2, s);

return { v };
}

real order_ode(real down, real up, real[] theta, data real[] x_r) {
int x_i[0];
real v;

v=integrate_ode_rk45(rank_density_ode, { 0.0 }, down, { up }, theta, x_r, x_i)[1, 1];
return v;
}


That worked as well.

5 Likes

Thanks, I will try with the integrate_ode_rk45 algorithm, thanks for the code!

I did not try 1e-6 as a tolerance level, I will look into it.

2 Likes

Did 1e-6 end up working? Or did this still give trouble?

Hi, 1e-6 did not work because then problems arose on other intervals, however, using the ODE integrator works nicely, I would not have thought of this, thanks a lot!

1 Like

Thanks for reporting the error and the code! So this:

Exception: integrate: error estimate of integral above zero 9.59331e-10 exceeds the given relative tolerance times norm of integral above zero  (in 'unknown file name' at line 150)


turned out to be a problem with how the tolerances were working in the integrator (check here for details: https://github.com/boostorg/math/issues/449). We put a patch in develop, so in a couple months the fix for this will make it out into a release.

The C++ version of your code is now the test for this in math :D (https://github.com/stan-dev/math/blob/develop/test/unit/math/prim/functor/integrate_1d_test.cpp#L470)

3 Likes

This is very cool, amazing work on your part, I am always impressed by the dedication of open-source developers.

2 Likes

Release is out that addresses this: Release of CmdStan 2.26.0

2 Likes