Is it possible to speed up rstan in Windows 64bit system

stan_beginer · November 23, 2020, 3:20am

Hi,

Recently I found my Rstan runs slower as my input data size increases, I used ‘cores = 8’ to take full advantage of all the available cores when I call stan() and I am wondering whether it’s all possible to let R use all the memory I have on my laptop (32GB).

I found some command like memory.limit() but not sure whether it will work in this situation. Also I remembered that when I install Rstan for the first time, there was some code (sample installation code on Rstan website) trying to optimize some settings but after that I never used similar code again.

Thanks!

wds15 · November 23, 2020, 7:11am

Run things under wsl.

stan_beginer · November 23, 2020, 2:24pm

Thanks so much! May I ask that what is wsl?

wds15 · November 23, 2020, 2:32pm

Linux in windows

andrjohns · November 25, 2020, 1:25am

Unfortunately that’s expected behaviour as the sample size increases.

When you call cores = 8, all it does is specify the number of MCMC chains that can be run in parallel. So if you specify cores = 8, but are using 4 chains, then you will only be using 4 cores. If you want to take advantage of extra CPU cores that you have available then you’ll need to update your model to use one of the parallelisation frameworks (reduce_sum or map_rect). Note that if you want to use reduce_sum you’ll need to run your models through cmdstanR, since that functionality is not yet available in RStan.

You don’t need to do anything to manually change how R uses memory, it will automatically use as much memory as it needs.

Run things under wsl.

To expand on Sebastians answer a little, we commonly see models running much faster under Linux than Windows (on the same hardware), there’s more information on that in this thread. However, you don’t need to switch to a new operating system to take advantage of this. With the Windows Subsystem for Linux (WSL) framework, you can operate a Linux environment from within your existing Windows OS. Running Stan under WSL has seen speeds comparable to native Linux.

The most user-friendly way to take advantage of this is through installing RStudio Server on your WSL install, which lets you interact with the WSL R installation through your web browser. The RStudio support pages have a great guide for setting this up which has worked really well for me:

George_GL · December 28, 2021, 2:36pm

Thank you for the tip. I have installed the WSL and RStudio Server as posted. I can report that the same model in brms that took me 94 minutes under Win11, now takes 24 minutes. I am fairly confident that I can take the number even lower with proper variable centering and better priors. This is amazing. I was about to dish out some serious cash for better CPU power but this saved me money. Thanks for that.

For general info: the model i am running is formalization of 98.000 observations, using the formula:

fit = brm(y ~ 1 + x+ (1 + x|idcntry/idschool),
data = test,
chains = 2,
cores = 4,
prior = c(prior(normal(0, 3), class = Intercept),
prior(normal(0, 3), class = b),
prior(cauchy(0, 1), class = sd)),
backend = “cmdstanr”,
threads = threading(4)
)

EDIT: Current hardware: Ryzen 5, 3600 (6 cores); 16GB RAM

andrjohns · December 30, 2021, 1:46am

Great to hear! Given that you’re using brms and cmdstanr, you might be able to improve this speed even further.

If you have a discrete GPU available in your system, then you can install OpenCL following this section of the cmdstan guide: 14 Parallelization | CmdStan User’s Guide

And configure cmdstanr to use OpenCL following this vignette: Running Stan on the GPU with OpenCL • cmdstanr

This should also give a fairly significant speed-up, and I’d recommend experimenting with using OpenCL-only, and removing the threads = threading(4)

George_GL · January 2, 2022, 8:40am

Thanks for the infos. I have tried that, but seem to hit a wall. Can it be that under WSL2, i cannot install video drivers?

cao · January 16, 2022, 8:02am

Is it possible to get cmdstan utilise a discrete GPU within an WSL2 environment? I have also hit a wall with this.

andrjohns · January 17, 2022, 4:32am

Can you post the error or specific issues that you’re encountering?

andrjohns · January 17, 2022, 7:39am

It looks like OpenCL is not yet supported in WSL, so GPU-acceleration with Stan will not be available.

You can follow the progress of OpenCL in WSL on their github: No OpenCL platforms reported · Issue #6951 · microsoft/WSL · GitHub

blokeman · January 17, 2022, 8:29am

Would this technique confer benefits even when using brms with its default backend and default threading settings? Ten months ago I tried to get cmdstanr up and running, but it just would not work despite significant help from this forum.

My laptop has 8GB of memory + 4 CPU cores and it is running Windows 10. I have dozens of models to fit, and with cores = 4 a single model presently takes about 1h. The kind of speedup described by George_GL would therefore be absolutely wonderful.

andrjohns · January 17, 2022, 9:24am

Yep models will generally run faster under WSL than Windows. Can you try the instructions I posted above, and open a new thread with the issue that you encounter?

blokeman · January 19, 2022, 10:35am

I assume you’re referring to the installation of WSL rather than cmdstanr?

Here’s an interim update:

I have now (finally) successfully installed WSL in order to reap the performance benefits described here. I don’t see much difference in fitting speed, at least not when using brms with its default backend in the following mock example:

require(brms)
set.seed(2022)
x1 <- rnorm(200)
x2 <- rnorm(200)
y <- rbinom(200, 1, plogis(-0.5 + 1*x1))
d <- data.frame(x1, x2, y)
t0 <- Sys.time();summary(mod <- brm(y ~ x1 + x2, family = bernoulli, data = d, seed = 2022, 
  prior = prior(normal(0, 2.5), class = b) + prior(normal(0, 2.5), class = Intercept)));Sys.time()-t0

On my machine, the model takes between 38 and 63 seconds to fit in both Windows and WSL. There appears to be a lot of variability depending on compilation time. With real models, however, it’s probably sampling speed rather than compilation speed that matters, so the jury is still out.

A major problem is that I cannot make beepr work from within WSL to notify me when models finish fitting. The library installs and loads correctly, but there’s no sound. I’ve installed the PulseAudio service as instructed here, and I’ve also installed the VLC Linux program as instructed in the documentation of beepr. I can’t even play .wavs from the command line. I get a whole bunch of error messages, including ALSA lib confmisc.c:767:(parse_card) cannot find card ‘0’ . But I guess this is the wrong forum for troubleshooting WSL audio issues.

Even with all these problems, however, it is still possible that WSL might offer a significant advantage over Windows. This is because in Windows, RStudio crashes after about once per four brms fits. If WSL turns out not to have this problem, it will still be an overall improvement. I’ll post further updates (perhaps in a whole new thread as Andrew suggests) once I know more.

ajnafa · January 22, 2022, 10:45am

This is because in Windows, RStudio crashes after about once per four brms fits.

This is usually caused by an issue with rstan crashing when trying to recompile a model or because you called rm(list = ls()); gc() between models and didn’t restart your R session afterwards.

cao · January 23, 2022, 11:40pm

thanks @andrjohns. This definitely explains why it wasn’t working for me. I’ll setup a VM for my Stan environment instead of WSL.

blokeman · February 5, 2022, 2:55pm

It seems to often happen right after a model finishes fitting, for reasons unknown to me. I’m definitely not emptying the workspace between fits.

To my regret, I have also now confirmed that this happens in WSL as well as Windows 10, so there’s little reason for me to continue using the former. It’s got the same problems plus cannot do alarm sounds when computations complete. I wonder if this is a bug with brms’s bernoulli and categorical family implementations. That’s all I ever fit.

bgall · August 7, 2022, 11:59pm

Something that can also be done is to use a different BLAS Speedup by using external BLAS/LAPACK with CmdStan and CmdStanR/Py

Topic		Replies	Views
Speeding up CmdStanR by using more cores? General cmdstanr	15	1927	April 1, 2024
Stan - Memory usage when running on high performance clusters in parallel General	0	447	June 1, 2022
Rstan vs. CmdStan General rstan , techniques , performance	2	1524	August 6, 2021
Increase RAM use by Stan? General	3	1393	April 7, 2019
Speed up the Rstan run RStan	1	1030	September 4, 2019

Is it possible to speed up rstan in Windows 64bit system

Related topics