Estimate time to run HMC

Hi. I have a computationally expensive individual-level model (~330 parameters, ~134K observations) where one of the computation chunks takes ~93% of the time. I’m using CmdStanR.

Meanfield VI fits in about 30 minutes without reduce_sumand 14 minutes with it. I haven’t been able to complete an HMC run yet — my current server access limits me to 24-hour jobs, and I apparently that’s not enough.

The gradient evaluation message Stan prints at startup says: “Gradient evaluation took 5.01549 seconds” (no reduce_sum).

Is there a reliable way to estimate total HMC wall time from this gradient timing (or an upper bound), given that I don’t yet know what treedepth the sampler will settle on? I’m trying to figure out whether I need to request longer server access (e.g., a week/month) before attempting a full run.

Any guidance appreciated!

The wall time will depend critically on what treedepth the sampler needs in order to usefully explore your posterior.

If you stick with the default max_treedepth of 10, then the worst case treedepth is 1024, which would take a very long time per iteration. If the posterior is quite “nice” you might hope for treedepths of say 5, which at 5 seconds per gradient eval might still put you in the range of ~2 days for 1000 iterations.

134K observations often isn’t prohibitive, you might consider posting the stan code you’re using to ask about whether there are ways to make it more efficient. 5 seconds per gradident eval is a very long time. Hopefully we can do better.

I see, thank you! I’ll check if I’m allowed to share the code, since this is a shared project.

In the meantime, could you tell me what formula you used to estimate this ~2 days per 1000 iterations? Is it the following? “wall_time = leapfrog_steps × cost_per_grad × iterations”

yup, that’s it. The overwhelming majority of the computation in a model like yours comes from the gradient evaluation, and that happens once per leapfrog step. The thing to emphasize here is that the number of leapfrog steps per iteration is unknown. By default it’s capped at 2^{10}. Capping it lower doesn’t typically help much, because it trades off faster wall time per iteration for worse exploration and lower effective sample size per iteration. The entire beauty of dynamic HMC is its ability to choose the number of leapfrog steps dynamically, and it’s not usually helpful to curtail that.

If you do share the code, I’d suggest starting a separate discourse thread, by the way.

Thank you so much Jacob, you’ve been very helpful!