Improving Stan sampling speed

Hi all,
I have a very complex model that I’m running using multithreading in a server (where I cannot really start installing stuff). I read @avehtari’s blog post: Options for improving Stan sampling speed – The Stan Blog
but I’m a big confused on how to actually implement this. I think one should add these optimization options in cpp_options, but cmdstanr doesn’t seem to check them, so I cannot know if they are right, and if they worked.

My conclusion is that I should use

cmdstan_model("model.stan" ,
                          cpp_options = list(stan_threads = TRUE,
                                             O = 1,
                                             stan_cpp_optims = TRUE,

Is this correct? Should this work on every system? Is there anything else that I could do that plays well with multithreading? I don’t care if I increase compilation time, the model takes a day and half to finish, so I’ll be happy to do anything that improves sampling time.

This is probably unintentional, you will generally want O = 3 for the C++ compiler. This is also the default for Stan, so you don’t need to specify anything for it.

Aki separately discusses the Stan compiler setting, which only supports optimization levels 0 (default) or 1, so that is where --O1 could be used.

To answer your more general questions, the STAN_CPP_OPTIMS setting uses a lot of flags which may or may not be supported, depending on the exact version of your compiler. If they aren’t supported, the C++ compiler will just refuse to build, so it’s not unsafe to try it.

These arguments are all case-sensitive I believe. If you want to check that they worked, you can also use the cmdstanr::cmdstan_make_local function, which will write out the results to $CMDSTAN/make/local and will then apply them to every build you do with that installation of cmdstan. You can also manually edit that file with the things Aki’s post suggests


ok, thanks!

I made a mess with “cmdstan_make_local” because it didn’t let me remove things. But I found that there is a local.example file with many things that could be directly uncommented.

The only thing I added was the following


CXXFLAGS+= -march=native
CXXFLAGS+= -mtune=native

so far it’s running!