Advice on optimizing my use of integrate_1d()?

Hey folks, I need to use integrate_1d() in a model for the first time, and unfortunately I suspect that it’s a particularly gnarly/compute-heavy integral, but I’m rather calculus naive so I thought I’d post a general description to see if any simplifications jump out to anyone before I dive in.

The integral is of the form:

Y = \int_a^b \Big[f(x) \times g(x) \times e^{- \Big( \big(h(x) \times c\big) + \big(i(x) \times d\big) \Big)}\Big] dx

Where:

  • c & d are parameters

  • f() is a unimodal pdf (i.e.\int_{-\infty}^{+\infty}f(x)dx = 1)

  • a and b are bounds on x outside which f(x) has negligbile density

  • g() is a unimodal pdf multiplied by a constant such that it’s peak density is 1, and
    g() has non-neglible density outside the bounds set by a and b

  • h() and i() are complicated positive-valued functions (I actually don’t have a closed-form for these, but empirical observations for their output at a range of values for x, and am thinking I’ll add a GP or spline structure earlier in the model fitting those empirical observations and use the parameters thereof to interpolate for the use of integrate_1d()).

Later in my model, Y is used in the likelihood:

obs_Y ~ normal(Y,noise);

I’m off to start implementing, but if in the meantime anyone has any advice for simplifying this, please chime in!

this means complicated functions, right, and not a map from the complex numbers to the positive reals?

Ah, yes, I’ll edit to use “complicated” instead, thanks!

You may get significant speed-up by using a bigger error tolerance in integrate_1d() and then diagnose and adjust following [2205.09059] An importance sampling approach for reliable and efficient inference in Bayesian ordinary differential equation models. The title says ODEs, but the same approach can be used for any numeric algorithm with adjustable error tolerance. There’s also code available to make the approach easy to use with Stan

3 Likes

OK, making progress, but a bit uncertain whether I need to use the xc variable; do I understand correctly that if one is computing a definite integral, it it always necessary?

If so, and I’m integrating on the definite interval from 0 to 1, say integrating a std_normal for example, would this be done as:

functions{
  real integrand( real x, real xc, array[] real theta, array[] real x_r, array[] real x_i ){
    real out;
    if(x<.5){
      out = std_normal_lpdf( x, theta[1], theta[2] ) ;
    }else{
      out = std_normal_lpdf( 1-xc, theta[1], theta[2] ) ;
    }
    return( out ) ;
  }
}

?