How does Stan's computation time scale with sample size?

sonicking · February 22, 2022, 10:55pm

Hello, is there any knowledge on how Stan’s computation time scale with sample size?

Say I run a multi-level model. At first, I randomly sample N=500 to test the code using vectorization in the model block. I achieve convergence and the code finishes running in 5 minutes. Now I run the same model with the same parameters everywhere but for a much large N (say, 50K). Is it possible to make a statement about the corresponding computation time? Thanks.

mike-lawrence · February 23, 2022, 12:15pm

Not as you described. There are multiple factors that determine sampling time, data volume being one that in turn can have nuance itself. See the generate_and_fit.r code here and at the top you’ll see 4 data-generating parameter variables that show some of the ways there can be “more data”. You could also use that project to explore the impact of different data volume configurations on sampling time to maybe get an estimate for your specific scenario. The model code there is a highly-optimized version of the SUG 1.13 hierarchical model.

betanalpha · March 15, 2022, 9:44pm

One can discuss scaling of the gradient evaluation time in the context of a given Stan program, but without that Stan program there’s not much one can say. The entirely of Stan’s computation time, however, is not determined by the gradient evaluation time but rather the gradient evaluation time and the number of gradient evaluations needed. The scaling of the number of gradient evaluations needed will depend on the particular data, the assumed model/Stan program, and the provenance of the data.

For some more discussion see Addressing Stan speed claims in general - #45 by emiruz and Chains stuck when use larger dataset, but not smaller.

Topic		Replies	Views
Theoretical lowest runtime – hierarchical linear model Modeling fitting-issues , specification , performance	14	1394	September 22, 2019
Hierarchical model for categorical data - sampling takes too long Other matlabstan	0	722	July 1, 2019
STAN for large model with a lot of parameter General performance	2	740	June 21, 2018
Fine tuning for polynomial Posteriors Modeling performance	1	38	December 3, 2024
Fit time monotonically slower with progressive simulation iterations RStan rstan , performance	3	614	November 21, 2020

How does Stan's computation time scale with sample size?

Related topics