Is it possible to speed up rstan in Windows 64bit system

Hi,

Recently I found my Rstan runs slower as my input data size increases, I used ‘cores = 8’ to take full advantage of all the available cores when I call stan() and I am wondering whether it’s all possible to let R use all the memory I have on my laptop (32GB).

I found some command like memory.limit() but not sure whether it will work in this situation. Also I remembered that when I install Rstan for the first time, there was some code (sample installation code on Rstan website) trying to optimize some settings but after that I never used similar code again.

Thanks!

Run things under wsl.

2 Likes

Thanks so much! May I ask that what is wsl?

Linux in windows

Unfortunately that’s expected behaviour as the sample size increases.

When you call cores = 8, all it does is specify the number of MCMC chains that can be run in parallel. So if you specify cores = 8, but are using 4 chains, then you will only be using 4 cores. If you want to take advantage of extra CPU cores that you have available then you’ll need to update your model to use one of the parallelisation frameworks (reduce_sum or map_rect). Note that if you want to use reduce_sum you’ll need to run your models through cmdstanR, since that functionality is not yet available in RStan.

You don’t need to do anything to manually change how R uses memory, it will automatically use as much memory as it needs.

Run things under wsl.

To expand on Sebastians answer a little, we commonly see models running much faster under Linux than Windows (on the same hardware), there’s more information on that in this thread. However, you don’t need to switch to a new operating system to take advantage of this. With the Windows Subsystem for Linux (WSL) framework, you can operate a Linux environment from within your existing Windows OS. Running Stan under WSL has seen speeds comparable to native Linux.

The most user-friendly way to take advantage of this is through installing RStudio Server on your WSL install, which lets you interact with the WSL R installation through your web browser. The RStudio support pages have a great guide for setting this up which has worked really well for me:

5 Likes

Thank you for the tip. I have installed the WSL and RStudio Server as posted. I can report that the same model in brms that took me 94 minutes under Win11, now takes 24 minutes. I am fairly confident that I can take the number even lower with proper variable centering and better priors. This is amazing. I was about to dish out some serious cash for better CPU power but this saved me money. Thanks for that.

For general info: the model i am running is formalization of 98.000 observations, using the formula:

fit = brm(y ~ 1 + x+ (1 + x|idcntry/idschool),
data = test,
chains = 2,
cores = 4,
prior = c(prior(normal(0, 3), class = Intercept),
prior(normal(0, 3), class = b),
prior(cauchy(0, 1), class = sd)),
backend = “cmdstanr”,
threads = threading(4)
)

EDIT: Current hardware: Ryzen 5, 3600 (6 cores); 16GB RAM

Great to hear! Given that you’re using brms and cmdstanr, you might be able to improve this speed even further.

If you have a discrete GPU available in your system, then you can install OpenCL following this section of the cmdstan guide: 14 Parallelization | CmdStan User’s Guide

And configure cmdstanr to use OpenCL following this vignette: Running Stan on the GPU with OpenCL • cmdstanr

This should also give a fairly significant speed-up, and I’d recommend experimenting with using OpenCL-only, and removing the threads = threading(4)

Thanks for the infos. I have tried that, but seem to hit a wall. Can it be that under WSL2, i cannot install video drivers?

Is it possible to get cmdstan utilise a discrete GPU within an WSL2 environment? I have also hit a wall with this.

Can you post the error or specific issues that you’re encountering?

It looks like OpenCL is not yet supported in WSL, so GPU-acceleration with Stan will not be available.

You can follow the progress of OpenCL in WSL on their github: No OpenCL platforms reported · Issue #6951 · microsoft/WSL · GitHub

1 Like

Would this technique confer benefits even when using brms with its default backend and default threading settings? Ten months ago I tried to get cmdstanr up and running, but it just would not work despite significant help from this forum.

My laptop has 8GB of memory + 4 CPU cores and it is running Windows 10. I have dozens of models to fit, and with cores = 4 a single model presently takes about 1h. The kind of speedup described by George_GL would therefore be absolutely wonderful.

Yep models will generally run faster under WSL than Windows. Can you try the instructions I posted above, and open a new thread with the issue that you encounter?

2 Likes

I assume you’re referring to the installation of WSL rather than cmdstanr?

Here’s an interim update:

I have now (finally) successfully installed WSL in order to reap the performance benefits described here. I don’t see much difference in fitting speed, at least not when using brms with its default backend in the following mock example:

require(brms)
set.seed(2022)
x1 <- rnorm(200)
x2 <- rnorm(200)
y <- rbinom(200, 1, plogis(-0.5 + 1*x1))
d <- data.frame(x1, x2, y)
t0 <- Sys.time();summary(mod <- brm(y ~ x1 + x2, family = bernoulli, data = d, seed = 2022, 
  prior = prior(normal(0, 2.5), class = b) + prior(normal(0, 2.5), class = Intercept)));Sys.time()-t0

On my machine, the model takes between 38 and 63 seconds to fit in both Windows and WSL. There appears to be a lot of variability depending on compilation time. With real models, however, it’s probably sampling speed rather than compilation speed that matters, so the jury is still out.

A major problem is that I cannot make beepr work from within WSL to notify me when models finish fitting. The library installs and loads correctly, but there’s no sound. I’ve installed the PulseAudio service as instructed here, and I’ve also installed the VLC Linux program as instructed in the documentation of beepr. I can’t even play .wavs from the command line. I get a whole bunch of error messages, including ALSA lib confmisc.c:767:(parse_card) cannot find card ‘0’ . But I guess this is the wrong forum for troubleshooting WSL audio issues.

Even with all these problems, however, it is still possible that WSL might offer a significant advantage over Windows. This is because in Windows, RStudio crashes after about once per four brms fits. If WSL turns out not to have this problem, it will still be an overall improvement. I’ll post further updates (perhaps in a whole new thread as Andrew suggests) once I know more.

This is because in Windows, RStudio crashes after about once per four brms fits.

This is usually caused by an issue with rstan crashing when trying to recompile a model or because you called rm(list = ls()); gc() between models and didn’t restart your R session afterwards.

1 Like

thanks @andrjohns. This definitely explains why it wasn’t working for me. I’ll setup a VM for my Stan environment instead of WSL.

It seems to often happen right after a model finishes fitting, for reasons unknown to me. I’m definitely not emptying the workspace between fits.

To my regret, I have also now confirmed that this happens in WSL as well as Windows 10, so there’s little reason for me to continue using the former. It’s got the same problems plus cannot do alarm sounds when computations complete. I wonder if this is a bug with brms’s bernoulli and categorical family implementations. That’s all I ever fit.

Something that can also be done is to use a different BLAS Speedup by using external BLAS/LAPACK with CmdStan and CmdStanR/Py