Integrate_1d: fifth argument must be data only


I have defined a custom probability distribution (lpdf) in the functions-block. Within the definition of the lpdf, I use a call to the function “integrate_1d” – which serves the purpose of integrating another function defined in the same functions block. More specifically: In defining the lpdf, I need to integrate a function of the type exp(spline), where spline is a b-spline. The signature of the function, I need to integrate is:

real exp_spline(real x, real xc, real[] theta, real[] x_r, int[] x_i).

Within the lpdf with signature

real exspl_lpdf(real y, real theta, real coeff, real[] ext_knots)

I call the function via:

integrate_1d(exp_spline, ext_knots[1], y, theta, ext_knots, x_i, 1e-4);

Note that the argument “ext_knots” defines the (extended) knots of a b-spline. These are meant to be supplied by the user via setting appropriate data.

When compiling my program I first got the error message:
“the 5th argument must be data-only. (Local variables are assumed to depend
on parameters; same goes for function inputs unless they are marked with
the keyword ‘data’.)”
(in my case the fifth argument refers to the knots of the spline)

After adding the keyword “data” in front of the argument, the error message disappeared.

real exspl_lpdf(real y, real theta, real coeff, data real[] ext_knots);

However, my (first) question is:
If I mark an argument as data, does Stan still check that I call the function appropriately?
That is, if I mark an argument as data and violate this agreement later in the program (by calling it with something not defined as data), will I get an error message? Further, it seems that adding “data” in front of the function parameter is the only way to fulfill the requirement, because I cannot use data in the functions block – as the data block is defined after the functions block. Is this correct?

Now with this added “data” in front of the parameter the model compiles – but there is no sampling output.
Error messages now seem to indicate some problem in evaluating the integral and I wonder, if it is a numerical issue or if there still is something wrong with the definition and usage:

“error estimate of integral above zero 5.3119e-06 exceeds the given relative tolerance times norm of integral above zero”

Thank you very much for suggestions and help!

1 Like

sorry for not getting to you earlier, glad you were able to resolve the syntax issue.

Yes, AFAIK Stan will check that the argument is data (and not a parameter/function of parameters) at compile time. I.e. you should not be able to compile code that would let the function be called with non-data arguments.

Yes, that is correct, functions do not have access to model data unless you explicitly pass it to them.

That’s almost certainly an issue with the integral - either it is numerically unstable or even completely undefined. In some cases, decreasing the relative tolerance (optional 6th argument) may help, but most often you need to investigate what the issue is and reformulate the integration problem. Unfortunately we can’t help you much without seeing the code for the actual integrand. One thing that is often useful is to print the parameter values you pass to the function and the experiment with the values that fail outside of Stan (e.g. in R/Python/whatever you use). Just plotting the integrand can reveal a lot of bad behaviour.

Best of luck with your model!

Thank you very much for taking the time to answer and for the helpful comments!
So, I think I used the data modifier correctly and the remaining issue seems to be a numerical/
computational issue.This is helpful because I was not sure as to whether there could be some problems with my usage of the integrate function.

With respect to the numerical issue, however, it seems puzzling to me because:

  • the function I integrate should be well-behaved/defined. It is the exponential of a spline function.
    Therefore it is continuous and - except for the knots - smooth.
  • one-dimensional quadrature should work fine for continuous functions.

I also understand that it is perhaps helpful to see the underlying code.
If it is ok, I will open a new issue and paste the code (after all, the title of this issue is misleading,
because it is now solely a numerical problem).

Thank you very much.

1 Like

Yes, that would be sensible.

You are very welcome :-)