Stan performance inside Virtual Machines

alberto.dellera · September 15, 2020, 7:42pm

Dear all,

I will soon need to run Stan inside a Virtual Machine, and I was wondering whether I should expect the same (or similar) performance that I would get when running directly on the host operating system.

My question is rather general, but I’m particularly interested in Virtualbox (Linux guest, Windows 10 host) on Intel.

I know that “in theory”, since Stan is CPU-intensive and RAM-intensive, being inside a VM should not matter that much since the machine instructions are directly executed on the CPU anyway, and syscalls should be (relatively) rare for Stan. But “in practice”, reality does love to be different …

Thanks in advance!

bgoodri · September 16, 2020, 10:43pm

I don’t know, but if you have a Windows 10 host, my guess would be that WSL is at least as fast as Virtualbox.

andrjohns · September 17, 2020, 2:40am

Hi Alberto,

We’ve seen that with WSL on Windows 10, Stan’s performance will actually be faster than in native Windows. The specifics were in this post: Large Cmdstan performance differences Windows vs. Linux

But that whole thread has more discussion of the performance differences between Windows & Linux.

Additionally, RStudio has a great guide on getting RStudio server setup on WSL so you can use RStudio through your web browser, but be running on the WSL backend: https://support.rstudio.com/hc/en-us/articles/360049776974-Using-RStudio-Server-in-Windows-WSL2

alberto.dellera · September 19, 2020, 11:00am

Hi Andrew,

the thread you have quoted is very very interesting, thanks!

It seems that there is a general consensus against running Stan on Windows; many people report severe degradation in performance, mostly blaming either the inefficient Windows compiler(s) or that Stan is developed for Linux first and then ported to Windows. So, for the sake of a general discussion about Virtualized Environments, let’s put Stan-on-Windows out of the equation, and consider Stan-on-Linux only (virtualized or not).

Stan should (as far as I know, correct me if I’m wrong) spend 99.9% of the time on the CPU cores (crunching numbers and accessing the RAM), doing very few syscalls (I/O mostly). Syscalls are the place where performance issues usually arise in virtualized environments.

Even during Model compilation by the C++ compiler, when many files are accessed, I would expect the CPU+RAM to be the bottleneck, while trying to apply all those expensive numerical optimizations (loop unrolling, function inlining, etc etc).

During MCMC sampling - after the initial data loading from the filesystem, I can imagine syscalls rarely done only

to get more memory from the OS (e.g. when appending new MCMC samples to the chain output buffer)
to get the system time for measuring elapsed times
to output debug messages

I’m wondering if the above description is accurate: that is what I would “theoretically” naively expect, but I’m not a specialist in numerical programming, and I’m far from being an expert in Stan as well or virtualization. There are probably other important aspects that I have overlooked, and that arise in practice.

Again, thanks in advance for any contribution :)

maxthemillion · April 16, 2021, 7:41am

Hi Alberto,

I am just looking into running Stan on VMs as well. As this thread did not continue (unfortunately), did you gain any additional insights in the process, which are worth to share?

Topic		Replies	Views
Is it possible to speed up rstan in Windows 64bit system General	17	2981	August 7, 2022
Poor performance for compiled Stan models General performance , rstanarm	10	2037	July 6, 2020
Rstan vs. CmdStan General rstan , techniques , performance	2	1509	August 6, 2021
Large Cmdstan performance differences Windows vs. Linux Developers	39	5405	August 9, 2022
Compilation time excessively long on Windows with fresh R install Interfaces rstan	2	722	July 7, 2020

Stan performance inside Virtual Machines

Related topics