The most efficient and least awkward way to do this appears to be to add this functionality to cmdstan directly.
Is there interest for this? If yes, whenever I implement this for myself I’ll take extra care and make this into a pull request. (Conditional on me actually managing to find a good solution). Is this the right place to ask this, or should this be done on github?
Reason for implementation: investigate impact of ode solver tolerances on the accuracy of the gradient of the log posterior.
You reporting different/wrong values with step size and metric provided is consistent with this (edit: init=sample) only being so because the sampler diverges immediately.
But I haven’t gotten around to running the code yet.
Edit2:
So if just ran the code. If you provide the metric and step size, divergent__==0, n_leapfrog__!=1 and the parameter values differ, while if you do not provide the metric and stepsize, divergent__==1, n_leapfrog__==1 and the parameter values are identical.
Hm, so it looks like currently the simplest (yet absurdly hacky way) is to specify a preposterous step size (e.g. 1e16) to ensure immediate divergence to ensure that your hack above works. This appears to work, and while I have a simple serial python solution, parallelization via threads should be straightforward. I was hoping this was possible via fixed_param=True, but fixed_param does not write the gradients to the diagnostic file.
Edit2: Another potentially significant drawback of this approach is that it always evaluates the log prob (and the gradient or most of the work needed for it?) twice, once at the original position and then at a potentially very awkward position.
Anyhow, for 4x10 samples from the prior, the planetary motion example reveals the following, using rk45 with rel_tol==abs_tol==tolerance and max_no_steps==1e8:
(Edit: comparing with a reference solution obtained with tolerance=1e-14)
I.e. while the trajectory/lp__ appears to converge rather quickly, the gradients stay really really really badly approximated (for some draws) for “much” longer.
Tagging some relevant people @charlesm93 (your model), @betanalpha (having voiced concerns about potential accuracy/tuning problems for the current adjoint implementation) and @wds15 (we should investigate this for the adjoint solver).
I think one possible conclusion of this very limited convergence study should be that we have to tell people that they should really check the appropriateness of their solver configurations.
Before I forget. This is great and nice… but you should be aware of RStan having this feature already. It’s just that RStan is using a very old Stan still unless you install the 2.26 version from the Stan mirrors.
Having this functionality for cmdstan would also be nice, of course.