Difference in behavior between integrate_1d and integrate

alescia · September 29, 2020, 9:21am

I have an issue computing the integral of a function in rstan.

With “integrate_1d”, the integral cannot be computed over an interval, whereby it throws up “error estimate of integral above zero exceeds the given relative tolerance times norm of integral above zero” .

However, if I do the integration with “integrate” in R, then I have no such issue over that interval.

I am looking for

why the two functions behave differently
how I can get “integrate_1d” to behave more like “integrate”, OR
how I can use the function “integrate” in my rstan code.

I paste some code below to reproduce the issue. The density function correspond to a dynamic model of choice (Linear Ballistic Accumulator model), and the integral computes the probability that one value is more than the other at a given time.

I also attach a file with the code for convenience.forum question code.R (3.7 KB)

stancode <- 'functions{
  
     real lbaX_pdf(real X, real t, real A, real v, real s){
          //PDF of the LBA model
          
          real b_A_tv_ts;
          real b_tv_ts;
          real term_1;
          real term_2;
          real pdf;
          
          b_A_tv_ts = (X - A - t*v)/(t*s);
          b_tv_ts = (X - t*v)/(t*s);
          term_1 = Phi(b_A_tv_ts);
          term_2 = Phi(b_tv_ts);
          pdf = (1/A)*(-term_1 + term_2);
          
          return pdf;
     }

     
     real lbaX_cdf(real X, real t, real A, real v, real s){
          //CDF of the LBA model
          
          real b_A_tv;
          real b_tv;
          real ts;
          real term_1;
          real term_2;
          real term_3;
          real term_4;
          real cdf;	
          
          b_A_tv = X - A - t*v;
          b_tv = X - t*v;
          ts = t*s;
          term_1 = b_A_tv * Phi(b_A_tv/ts);	
          term_2 = b_tv   * Phi(b_tv/ts);
          term_3 = ts     * exp(normal_lpdf(b_A_tv/ts|0,1)); 
          term_4 = ts     * exp(normal_lpdf(b_tv/ts|0,1));
          cdf = (1/A)*(- term_1 + term_2 - term_3 + term_4);
          
          return cdf;
          
     }

// proba that 1 is ranked before 2 in first race

    real rank_density(real x,          // Function argument
                    real xc,         // Complement of function argument
                                     //  on the domain (defined later)
                    real[] theta,    // parameters
                    real[] x_r,      // data (real)
                    int[] x_i) {     // data (integer)
        
        real t = theta[1];
        real A = theta[2];
        real v1 = theta[3];
        real v2 = theta[4];
        real s = theta[5];
        real v;
        
        v=lbaX_pdf(x, t, A, v1, s)*lbaX_cdf(x, t, A, v2, s);
        
        return v;
}

// cumulative proba of 1 before 2 in first race, knowing neither reached b

  real order(real down, real up, real[] theta, data real[] x_r) {
    int x_i[0];
    real v;

    v=integrate_1d(rank_density, down, up, theta, x_r, x_i,1e-8);
        // }
    return v;
}
  }
'

library(rstan)
expose_stan_functions(stanc(model_code = stancode))

t<-0.5
b<-1
A<-0.5
v1<-1
v2<-1
s<-1

# This compute the probability that 1 and 2 are less than b but 1 is more than 2 at time t

order(-10,b,theta = c(t,A,v1,v2,s), x_r = double())

order(-10,0.66,theta = c(t,A,v1,v2,s), x_r = double())

order(-10,0.67,theta = c(t,A,v1,v2,s), x_r = double())

order(-10,0.70,theta = c(t,A,v1,v2,s), x_r = double())

order(-10,0.76,theta = c(t,A,v1,v2,s), x_r = double())

order(-10,0.77,theta = c(t,A,v1,v2,s), x_r = double())


# gives out following error, which occurs for x between 0.67 and 0.76 
# Exception: integrate: error estimate of integral above zero 9.59331e-10 
# exceeds the given relative tolerance times norm of integral above zero  
# (in 'unknown file name' at line 74)

# However, if I now use the function integrate in R, I do not have any problems in that domain:

f_1_2<-function(x,t,A,v1,v2,s){
  p<-lbaX_pdf(x, t, A, v1, s)*lbaX_cdf(x, t, A, v2, s)
  return(p)}

integrate(Vectorize(f_1_2),-10,0.66,t=t,A=A,v1=v1,v2=v2,s=s)$value

integrate(Vectorize(f_1_2),-10,0.67,t=t,A=A,v1=v1,v2=v2,s=s)$value

integrate(Vectorize(f_1_2),-10,0.70,t=t,A=A,v1=v1,v2=v2,s=s)$value

integrate(Vectorize(f_1_2),-10,0.76,t=t,A=A,v1=v1,v2=v2,s=s)$value

integrate(Vectorize(f_1_2),-10,0.77,t=t,A=A,v1=v1,v2=v2,s=s)$value

# here is the whole graph of the function "order" between -1 and 3:

N<-400
x<-(1:N)/100-1
ft<-rep(NA,N)
for (i in (1:N)) {ft[i]<-integrate(Vectorize(f_1_2),-10,x[i],t=t,A=A,v1=v1,v2=v2,s=s)$value}
plot(x,ft)

bbbales2 · September 29, 2020, 12:52pm

Yeah, integrate_1d can be rather frustrating.

The integrate_1d function is basically a wrapper around the Boost 1d quadrature stuff over here: https://www.boost.org/doc/libs/1_74_0/libs/math/doc/html/quadrature.html .

On top of doing the integrals, it will also compute gradients of these integrals which can blow up. But I don’t think it does that for expose_stan_functions.

To understand the error, check out: https://www.boost.org/doc/libs/1_74_0/libs/math/doc/html/math_toolkit/double_exponential/de_tol.html

I’m not sure what algorithm R’s integrate uses.

Looking at the function, Phi can be a rather fragile function to work with. I wonder if that is the thing blowing up. Stan and R use different implementations of functions like this. Numeric overflows/underflows mess up the integrator. I guess the place to start would be checking if either the pdf or cdf in your integral are failing to evaluate at any point.

Hopefully it’s one of those functions messing up so we can rewrite the integral and it will work (like what happened here: Using integration to fit the difference of gamma distributed random variable)

alescia · September 29, 2020, 1:33pm

Hi, thanks, I checked already the pdf and cdf and they are fine in the domain where the integral fails to evaluate.

I noticed that “integrate” does not rely on the same algorithm for integration as “integrate_1d”. When looking into the code for “integrate”, it call on “C_call_dqags”, which apparently is a C++ procedure in the QUADPACK library.

The question then is whether there is a way to get that C++ function to work in rstan. I have been looking into how to do this, but I must say this is a bit beyond me at the moment.

bbbales2 · September 29, 2020, 1:37pm

Do you have any luck turning down the tolerance for integrate_1d?

alescia · September 29, 2020, 1:38pm

No, that does not work either.

bbbales2 · September 29, 2020, 1:46pm

Ugh, that’s no good. What is the value of the function at one of the spots where the integration fails? (and the error message at that spot too – I’m wondering how close the integrator got)

Can you use the ODE integrator to do the integral?

Can you post a plot of rank_density for a set of parameters where integrate_1d fails?

alescia · September 29, 2020, 2:39pm

The code I already posted will generate the error messages. For reference, I get:

> order(-10,0.67,theta = c(t,A,v1,v2,s), x_r = double())
Error in order(-10, 0.67, theta = c(t, A, v1, v2, s), x_r = double()) : 
  Exception: integrate: error estimate of integral above zero 9.59331e-10 exceeds the given relative tolerance times norm of integral above zero  (in 'unknown file name' at line 150)
> integrate(Vectorize(f_1_2),-10,0.67,t=t,A=A,v1=v1,v2=v2,s=s)$value
[1] 0.09634799

Here is some code with graphs in addition to the one I posted, if you can look into it, see if you can find an issue. The last one is a plot of rank_density.

N<-1000
x<-(1:N)/1000
ft<-rep(NA,N)
for (i in (1:N)) {ft[i]<-lba_cdf(x[i], b, A, v1, s)}
plot(x,ft)


N<-8000
x<-(1:N)/1000-1
ft<-rep(NA,N)
for (i in (1:N)) {ft[i]<-lbaX_pdf(x[i], t, A, v1, s)}
plot(x,ft)

N<-3000
x<-(1:N)/1000-1
ft<-rep(NA,N)
for (i in (1:N)) {ft[i]<-lbaX_pdf(x[i], t, A, v1, s)*lbaX_cdf(x[i], t, A, v1, s)}
plot(x,ft)

I do not know how to use the ODE integrator for such things, isn’t it meant for ordinary differential equations?

bbbales2 · September 29, 2020, 2:53pm

Thanks for the code. I’ll have a look.

The ODE integrator computes y(t) given y' = f(t, y) and y(t0).

If your f is not a function of y, then the solution of the ODE is the integral \int f(t) dt.

bbbales2 · September 29, 2020, 3:21pm

I played around with the code a bit.

It looks to me like it’s able to do your integral [-10, 1] but then it won’t do your integral on [-10, 0.67]. Yeah that seems crazy to me. Is that a fair summary of the problem?

alescia · September 29, 2020, 3:26pm

Yes, that is why I am asking for pointers why this is so here!

bbbales2 · September 29, 2020, 3:48pm

I loosened up the tolerances a bit and the function seems to work for me now:

v = integrate_1d(rank_density, down, up, theta, x_r, x_i, 1e-6);

I’m not sure how well it’s working though. I’m wondering if there isn’t something screwed up in how we’re interpreting the error checks from Boost.

Here’s some code for using the ODE integrators to do this:

real[] rank_density_ode(real x,
  real[] y,          // Function argument
  //  on the domain (defined later)
  real[] theta,    // parameters
  real[] x_r,      // data (real)
  int[] x_i) {     // data (integer)
  
  real t = theta[1];
  real A = theta[2];
  real v1 = theta[3];
  real v2 = theta[4];
  real s = theta[5];
  real v;
  
  v=lbaX_pdf(x, t, A, v1, s)*lbaX_cdf(x, t, A, v2, s);
  
  return { v };
}

real order_ode(real down, real up, real[] theta, data real[] x_r) {
  int x_i[0];
  real v;
    
  v=integrate_ode_rk45(rank_density_ode, { 0.0 }, down, { up }, theta, x_r, x_i)[1, 1];
  return v;
}

That worked as well.

alescia · September 29, 2020, 3:52pm

Thanks, I will try with the integrate_ode_rk45 algorithm, thanks for the code!

I did not try 1e-6 as a tolerance level, I will look into it.

bbbales2 · September 30, 2020, 2:16am

Did 1e-6 end up working? Or did this still give trouble?

alescia · September 30, 2020, 8:20am

Hi, 1e-6 did not work because then problems arose on other intervals, however, using the ODE integrator works nicely, I would not have thought of this, thanks a lot!

bbbales2 · November 12, 2020, 7:26pm

Thanks for reporting the error and the code! So this:

Exception: integrate: error estimate of integral above zero 9.59331e-10 exceeds the given relative tolerance times norm of integral above zero  (in 'unknown file name' at line 150)

turned out to be a problem with how the tolerances were working in the integrator (check here for details: https://github.com/boostorg/math/issues/449). We put a patch in develop, so in a couple months the fix for this will make it out into a release.

The C++ version of your code is now the test for this in math :D (https://github.com/stan-dev/math/blob/develop/test/unit/math/prim/functor/integrate_1d_test.cpp#L470)

alescia · November 13, 2020, 7:58am

This is very cool, amazing work on your part, I am always impressed by the dedication of open-source developers.

bbbales2 · January 27, 2021, 12:41pm

Release is out that addresses this: Release of CmdStan 2.26.0

Topic		Replies	Views
Integrate_1d not convergent in for loop Modeling rstan	4	577	October 20, 2020
Model compiles, then "Selection:" shows up with no further activity Modeling rstan , specification	2	340	July 4, 2023
Problem with application of integrate_1d General rstan , fitting-issues , specification	2	695	December 14, 2021
The result of integrate_1d is confusing Modeling bug , integrator	2	630	June 3, 2020
Integrate_1d errors in stan Algorithms rstan , integrator	8	887	September 17, 2020

Difference in behavior between integrate_1d and integrate

Related topics