Problems remain after non-centered parametrization of correlated parameters

tlyim · June 30, 2019, 4:23pm

After comparing many combinations of the tuning parameters (warmup, iter, window , term_buffer , max_treedepth , metric=dense_e vs. diag_e , adapt_delta), the following combinations seem to give the shortest run time while keeping the sampling results at an acceptable level:

diag5.4K[35,150,0.82,600]_5.8 1.4hrs.txt (4.3 KB)
dense5.4K[35,150,0.982,600]_6.7 1.4hrs.txt (4.3 KB)
diag9K[35,150,0.82,600]_12 4.1hrs.txt (4.3 KB)
dense9K[25,50,0.982,600]_13.7 3.7hrs.txt (4.3 KB)

The takeways seem to be

For a given amount of simulated data points (9K or 5.4K), diag_e runs a bit faster than dense_e mainly because the former can take a lower level of adapt_delta (0.82, instead of 0.982) without leading to divergences.
- But as far as the parameters of interest are concerned, the N_Effs from diag_e seem to be more diverse, with the lowest being acceptable but substantially lower than the lowest of those from dense_e.
The default choices of window, init_buffer, and term_buffer (=25, 75, 50) appear to be quite good: alternative choices do not lead to dramatic improvements, especially when the sample size is smaller (ie, 5.4K, rather than 90K).
- (window, term_buffer) = (35, 150) performs slightly better. Not sure of the reason but my guess is that a larger window (35, rather than 25) reduces the iterations spent on the last and slowiest stage of the adaptation process, where further search for improvement has little incremental gains. On the other hand, a larger term_buffer (150, rather than 50) allocates more iternations to optimally adjust the initial typical set of estimates according to the findings from the adaptation process.
- Despite the rationalization given above, increasing (window, term_buffer) beyond (35, 150) would not necessarily improve further owing to the complicated adaptation process. There is simply no obvious relation between the run time and these tuning parameters. Apparently, they interact in a complicated way with the warmup and iter to determine the run time.
During the course of my comparison, I have the impression that choosing an unnecessarily high level of adapt_delta often results in hitting the max_treedepth (many such instances can lead to a rather long run time). So to avoid divergences, it’s about finding the suitable level of adapt_delta rather than the higher the better.

Topic		Replies	Views
Question on mutlivariate non-centered parametrizations Modeling rstan , techniques , fitting-issues , reparametrization	2	669	June 17, 2022
Non centered parameterization on variance parameter Modeling	31	8517	October 22, 2018
Simple Non-Centered Time Series? Modeling	7	1186	November 29, 2017
Problems with non-centered variance parameters (Matt tick) Modeling performance	0	343	January 6, 2019
Non-centred parametrization of covariance matrix causes many divergent transitions and slows down sampling Modeling specification , performance , cmdstanr , reparametrization	7	661	September 7, 2021

Problems remain after non-centered parametrization of correlated parameters

Related topics