I’m reporting here some results from using recent stanc O1 optimization. The O1 optimization was enabled in 2.29 and some fixes in 2.30 has made it working well with many models I’ve tested. O1 optimization is not on by default as it still needs more testing before making it default, but it’s easy to enable it, so it’s worth testing if you have slow sampling and the model has vector or array parameters.
Especially, the possibility of using a struct-of-arrays memory mapping for vector and array parameters can make a big difference. I see for many posteriors 25%-40% drop in sampling time. The posteriors for which I see big speed-ups are for example GLMs with a large number of predictors, basis function based splines and GPs, and covariance matrix based GPs. Struct-of-arrays memory mapping changes how the parameter and gradient values are stored, but can be made only for vectors and arrays that are not individually indexed, plus there are some other constraints.
stanc_options = list("O1")
brm() add (works at least for GLMs)
backend="cmdstanr", stan_model_args=list(stanc_options = list("O1"))
brms generated code is not (yet) optimized for benefiting from
--O1, so there maybe more brms models that will benefit from the
You can also check which parameters can use structure of array (SoA) memory mapping by new debug-mem-patterns option:
model <- cmdstan_model(stan_file=file, compile=FALSE) model$check_syntax(stanc_options = list("debug-mem-patterns", "O1"))
You see either AoS (array-of-structures) or SoA (structure-of-arrays). If you see some SoA, you may expect sampling speed differences. Please report your experiences, for example, in this thread
There is more information about “Stan compiler optimization levels” and “New optimization to better utilize vectorization and memory throughput” in Release notes Release of CmdStan 2.29 – The Stan Blog (although the release notes is for 2.29, it’s better use 2.30 (or later))
Thanks for all Stan C++ / stanc developers for making Stan faster