Levers to pull to increase stepsize so as to avoid saturating the treedepth

tvladeck · February 2, 2021, 11:11pm

So, we have a model that takes a long time to run.

The gradient evaluation is about 0.2 seconds (*)
We are currently running it for about 250 samples in both warmup and sampling
The adaptation typically “goes well” in the sense that multiple chains converge on the same step size and inverse metric
We don’t encounter divergences
Our chains mix well (Rhats and Neff are both good)
We have a very large number of parameters (~35,000) (**)

However we often have runs that saturate the treedepth. This leads to very long runtimes. About 29 hours (0.2 * 1023 * 500 / 60 / 60 ~= 29.)

Here are some things that we have tried:

Doing a VB run (or a short sampling run) before sampling to rescale transformed parameters such that parameters have ~unit variance on the unconstrained scale
Doing a VB run before sampling, taking the posterior covariance matrix (which is ~diagonal) and supplying it as the inv_metric to the sampling call
Doing (1) then (2)

We have not touched the step_size or adapt_delta arguments.

Despite doing (1) and/or (2) we often end up with small stepsizes (~0.005) and resulting treedepth saturation. Because of the lack of divergences and good mixing (as well as priors and posteriors that match our expectations), I am not concerned about the validity of the model. But the 29h thing is a bit … tough, especially since this model is both under active development and something we use in production. Yay.

So, the question is: what can we do to get larger step sizes to avoid saturating the treedepth?

(*) we have done a ton to try to reduce this, including attempts to use map_rect and reduce_sum, but this led to degradations in performance since the CPU was spending more time shuttling data around than doing computations. Until stan gets a conv1d function that handles an autodiff through a convolution more efficiently, or an FFT function, we’re limited here)

(**) but a very large number of these have relatively minimal impact on the model (many are std_normal variables getting multiplied by a cholesky decomposition of a covariance matrix)

andrjohns · February 2, 2021, 11:54pm

Hi Thomas, I believe that decreasing adapt_delta will increase the stepsize.

As for the model speed - if you’re using large vectorised operations, the 2.26 release included a greatly increased list of functions with support for OpenCL acceleration (list in this post). Additionally, it also included a profiling framework which could be helpful for identifying any bottlenecks and testing reparameterisations.

betanalpha · February 18, 2021, 12:09am

Small step sizes and large tree depths can be a sign of degenerate posterior density functions, Identity Crisis, which suggest that at least some of your parameters are strongly coupled. This can sometimes be resolved with stronger priors, reparameterizations, or even more data, but the best path forwards will depend on the nature of the degeneracy itself.

Topic		Replies	Views
Stepsize & Treedepth for Constrained Parameters Modeling	3	474	June 14, 2022
Max_treedepth saturated, but increasing it slows sampling to a standstill Modeling fitting-issues , specification , performance	12	1946	December 10, 2020
Model requires small stepsize and large treedepth, what steps to do? Modeling	1	1185	June 8, 2017
Setting Max Treedepth in difficult high-dimensional models Modeling	13	4143	August 6, 2017
Divergence vs hitting max_treedepth Modeling specification	1	822	June 25, 2019

Levers to pull to increase stepsize so as to avoid saturating the treedepth

Related topics