Stan performance inside Virtual Machines

Dear all,

I will soon need to run Stan inside a Virtual Machine, and I was wondering whether I should expect the same (or similar) performance that I would get when running directly on the host operating system.

My question is rather general, but I’m particularly interested in Virtualbox (Linux guest, Windows 10 host) on Intel.

I know that “in theory”, since Stan is CPU-intensive and RAM-intensive, being inside a VM should not matter that much since the machine instructions are directly executed on the CPU anyway, and syscalls should be (relatively) rare for Stan. But “in practice”, reality does love to be different …

Thanks in advance!

I don’t know, but if you have a Windows 10 host, my guess would be that WSL is at least as fast as Virtualbox.

1 Like

Hi Alberto,

We’ve seen that with WSL on Windows 10, Stan’s performance will actually be faster than in native Windows. The specifics were in this post: Large Cmdstan performance differences Windows vs. Linux

But that whole thread has more discussion of the performance differences between Windows & Linux.

Additionally, RStudio has a great guide on getting RStudio server setup on WSL so you can use RStudio through your web browser, but be running on the WSL backend:


Hi Andrew,

the thread you have quoted is very very interesting, thanks!

It seems that there is a general consensus against running Stan on Windows; many people report severe degradation in performance, mostly blaming either the inefficient Windows compiler(s) or that Stan is developed for Linux first and then ported to Windows. So, for the sake of a general discussion about Virtualized Environments, let’s put Stan-on-Windows out of the equation, and consider Stan-on-Linux only (virtualized or not).

Stan should (as far as I know, correct me if I’m wrong) spend 99.9% of the time on the CPU cores (crunching numbers and accessing the RAM), doing very few syscalls (I/O mostly). Syscalls are the place where performance issues usually arise in virtualized environments.

Even during Model compilation by the C++ compiler, when many files are accessed, I would expect the CPU+RAM to be the bottleneck, while trying to apply all those expensive numerical optimizations (loop unrolling, function inlining, etc etc).

During MCMC sampling - after the initial data loading from the filesystem, I can imagine syscalls rarely done only

  • to get more memory from the OS (e.g. when appending new MCMC samples to the chain output buffer)
  • to get the system time for measuring elapsed times
  • to output debug messages

I’m wondering if the above description is accurate: that is what I would “theoretically” naively expect, but I’m not a specialist in numerical programming, and I’m far from being an expert in Stan as well or virtualization. There are probably other important aspects that I have overlooked, and that arise in practice.

Again, thanks in advance for any contribution :)