Using Pathfinder or other method to set initial values for sampling

Hi all,

I’ve read that one of the uses of Pathfinder is to get better initial values for sampling with HMC. I’ve however not been able to find a way to use some Pathfinder draws to give to Stan (specifically cmdstanr) to use as initial values. Does anyone have some code to do this for a model with a lot of parameters, where specifically I just want to use a Pathfinder draw or mean draw to initialise my HMC?

Also, and this is unrelated, I’m having quite a bit of trouble using Pathfinder to actually recover posteriors with complicated models. Here are some outputs of Pathfinder being unable to recover hyperparameters but doing better with realisations of this random effects, using the default settings (simulated truth in red):

Whereas even 100 HMC draws after 100 warm-up draws seems to have no issue.

Nevertheless, I’d still like to use Pathfinder to set better initial values.

Thanks!

Matt

1 Like

EDIT: This is on the master branch now

This branch of cmdstanr let’s you directly use a pathfinder fit object to init for sampling. More generally you can use any fit to init any of the algorithms

What is gamma here? It looks like that is the worst miss from the plots. There will definitely be some cases of complex models where approximations like pathfinder won’t work well. But even in these cases using pathfinder as a place to init nuts should help

4 Likes

Awesome, I’m keen to try this out.

gamma are the hierarchically modeled region-level colonisation rates in a dynamic occupancy model. But I’m more concerned about the left hand facet with hyperparameters that do not get recovered at all.

Not surprising. It’s trying to find the best low-rank plus diagonal multivariate normal approximation q by (indirectly) minimizing \textrm{KL}[q || p]. It’s well known to underestimate variance, so it’s more surprising that the iota hyper parameters are being massively overestimated in the VI setting.

If 100 HMC draws after 100 warmup draws works, then I wouldn’t bother with trying better initializations. Also, you can use Pathfinder to initialize only a subset of the parameters if you think that’d be better.

Here’s a case study for CmdStanPy.

https://mc-stan.org/cmdstanpy/users-guide/examples/VI%20as%20Sampler%20Inits.html

I’m not sure how easy this is to do in CmdStanR.

P.S. I think it’d be easier to parse the plots if the horizontal (x-axis) range matched in the VI and HMC plots.

P.P.S. The biggest thing you can do for speeding up sampling with varying intercept hierarchical models is to introduce identifiability into the likelihood. For example you can set the effects to have one effect pinned at 0, but it’s more natural for priors to constrain the effects to sum to zero (and then make sure to include an intercept).

1 Like

Thanks Bob.

I suppose I was a bit surprised by Pathfinder actually not doing so well with these hyperparameters. I didn’t really need the initialisation from Pathfinder, I just thought it’d be a nice idea to jumpstart my HMC like this. Thanks anyway!

1 Like