Nested ad function

Can we generalize the algebra and ode solvers so that a user could call nested over an input function? The return would be the derivative of the input. This would allow score models to be fit, I think. I’d like to gather some feedback on whether this is possible.

The idea is that the steepest ascent of the (log) likelihood can be used for models with time varying parameters to update t + 1. In fact, this has close analogs to extended Kalman filtering which could also use this nested framework to get a derivative of the input function.

See eqn 3 p. 5 of

I have zero experience with the ode routines but wondering what may need to be done to enable an entirly new class of models to be fit.

@charlesm93 @yizhang @Bob_Carpenter

@wds15 too!

1 Like

One issue is that these implicit functions already run nested internally. Automatically implement an optimization constraint would require running nested twice, once to get the gradient for the constraint and then again for the solvers to evaluate. This then requires second-order autodiff which opens up a giant can of worms given the incomplete implementation of higher-order functionality across the language.

just like @betanalpha writes… this would need 2nd order AD and that’s not implemented for ODE solvers (the CVODES manual mentions that this is possible, but it’s not in Stan nor are there plans to do it).

I may be misunderstanding the issue or I’ve phrased the question wrong. Forget the implicit functions that have nested internally. Say I have a function where I want to call nested - it is not an implicit function written in stan-math so nested would not be called twice.

What I would like to do is take some twice differentiable function f and call nested on it like nested(f). It returns the gradient then the AD tape resumes. This would be like the current implicit functions in that nested is called once but it allows nested to be called within a Stan program.

So the nested(f(x)) returns the function value f(x) and the gradient of f wrt to x, right? In case you’d want to use this as part of sampling, then we’d need the derivative of the gradient. If you just want the gradient of f, then that’s the same as calling the gradient function in Stan-math.


Yes, I’d want both the function value and the gradient. My question is, can this be exposed to the Stan language, not just in stan-math?

1 Like

No, the gradient function cannot be exposed to the stan language. The sampler would need gradients of the gradients and higher order derivatives are not tested well enough is the current (and for quite some time now) state of the art in Stan-math.