I wanted to share a vignette about using the Laplace, ADVI, Pathfinder, and NUTS inference methods to fit epidemiological delay distributions. The vignette is for the epidist package, which uses brms as a backend. (There are a range of ways I think the vignette could be improved, but I didn’t want to let perfect be the enemy of the good, so sharing early.)
One reason for sharing is that I had difficulties using Pathfinder. In particular, for many of the paths I got:
Error evaluating model log probability: Non-finite gradient.
This error is interesting because the same model runs without any issue using NUTS (and Laplace and ADVI). Is there a way that the choice of optimisation paths can be made more robust? As a work around, I did consider subsetting to only those paths which didn’t produce errors, then taking a mixture of Gaussians using those. However I didn’t manage to make the save_single_paths option work (I only tried quite briefly). The epidist package is under quite active development, so breaking changes are likely, but if you are interested in a reproducible example of a model that gives Pathfinder issues then this could be of interest. (The vignette should be quite reproducible using the latest GitHub version of epidist, but if you run into problems feel free to message me or create an issue!)
As background, I’ve been working on this project as a contractor with CDC CFA. More broadly, lots of people across CFA are using Stan – so I wanted to say thanks (speaking in a personal capacity) for developing a useful tool to support public health! As well, the epidist package in particular wouldn’t be possible without brms.
Part of my interest to learn about Pathfinder is that we are fitting 100s of models each week (e.g. to estimate R_t) and would like to initialise NUTS in a sensible way. This is something I’ll start actively working on soon, and will be trying out Pathfinder for. That said, I can imagine that as the current data will be quite similar to last weeks data then it might be preferable to initialise using last week’s fits in some way, but of course we will see and can test this empirically.
Apologies for the somewhat disconnected post! If any of this is interesting / you have feedback or thoughts then feel free to get in touch.
Thanks for sharing @athowes. And for developing packages using Stan and brms.
That’s unexpected because Pathfinder and Laplace use the same L-BFGS optimization of the underlying log density, which is where non-finite gradients might pop up.
There’s nothing built-in that can be user controlled at the moment, but I’m wondering if we have something in our Laplace approximation that makes it more robust to local errors than the optimization in Pathfinder.
Is there any easy way to get a Stan program and data that cause a problem for Pathfinder? I’d like to try to debug what’s going on, but I could only find Stan program fragments in the linked repo.
As to the issue with your package, there may be an easier way to initialize than with Pathfinder. I often find initializing some of the main parameters in a sensible range is enough to avoid the pathologies that can arise with random initializations in the tail—Stan lets you do partial initialization and then randomly initializes the rest.
You can get the Stan code and data by calling epidist::epidist and setting fn = brms::make_stancode or brms::make_standata. I’ve created a repo here and done this with a script and saved the objects as stancode.rds and standata.rds if useful.
Good point that you can used mixed initialisation strategies. I have my suspicion that (some of) the problems we are having that are flagged as “convergence issue due to initialisation” are actually “convergence issue due to data/model mismatch” or otherwise require investigation of the specification rather than simply tweaking of initialisation – but all remains to be investigated. Feels like an interesting difference as to whether you see NUTS (or other) issues as a tool (from statistican’s point of view) or more as a burden (from practitioner’s point of view). (Not to imply that the statistican’s point of view is entirely correct: at the end of the day we do need to generate estimates.)