Large Cmdstan performance differences Windows vs. Linux

Its actually not that weird. Microsoft basically admitted (good for them and there is a first for everything) their powershells and whatnot are bad. Cygwin and other third-party stuff is more or less bad also.

Their primary target is web devs but this seems to work out for us too.

1 Like

Wsl is great, so is docker. I do some dev work with wsl.

Still it needs some work from user.

I would think httpstan run on wsl / docker would work great.

Jupyter lab / notebook can also used from wsl/docker which is great.

2 Likes

WSL (Windows Subsystem for Linux) https://docs.microsoft.com/en-us/windows/wsl/install-win10
Its available in the Microsoft store. Easy to install, took me 5 minutes, no hiccups.

I currently switched to Windows (from a Mac) because of WSL (and my employer made it harder and harder for my Mac to work).

I have found RStan works great through WSL. My current work flow is to build the a model locally with WSL and then push it off to a Linux server and get my outputs back in 1 to 10 hours when the models are done running. For anyone else, using this approach, I suggest putting your Stan models into an R Package to avoid recompiling.

And, @Bob_Carpenter, I wish you could convince people like my employer to move away from Windows! Although, MS is moving closer to Linux and it seems macOS is moving farther from Unix.

1 Like

Hmm. When I tried out wsl last year I didn’t notice performance differences, and you can get rstudio working via something like vcxsrv. I will check again re performance, but interested if others find similar or this model is for some reason particularly awkward on windows…

1 Like

I understand you and @Richard_Erickson are mostly joking, but please bear in mind that many users are on Windows for a wide range of reasons and I think Windows shaming is not very helpful for anything - I’ve always associated it with gatekeeping and the “REAL programmers do XY” gimmick. (I have a conflict of interest here as I am primarily on Windows and it suits me just well, but I understand why people may make different choices)

2 Likes

I’m serious about wishing my employer would move away from Windows.

Windows lacks a good, native compilers. It’s a well known problem and hence why RTools exists (and, more recently, WSL). More broadly, the problem also makes general development hard on Windows. To get RStan working, I had to have my local IT spend about an hour helping me reinstall R and RTools. Even after this, I sometimes still have to reconfigure R when I try to use RStan.

macOS requires the installation of XCode and then RStan works well after the initial setup.

Linux, simply requires a apt-get install r-cran-rstan and boom. RStan works with a single line of code.

The last reason is why I would recommend somebody having trouble with RStan to use WSL, which also allows the easy apt-get install option for RStan. The downside to using WSL to run RStan is that outputs need to be save open opened with Windows R to plot, but is otherwise a decent hack around Windows.

I wasn’t serious! I know it’s here to stay. If Mac was more business friendly and didn’t pull the rug out from under users ever release, they might get some business traction. As is, MS is the only company that’s actually respectful of business users, so they still have them all.

1 Like

I feel like this has very little to do with Windows, and a lot to do with the difficulty packaging RStan. Recent versions of Visual Studio produce fairly performant code, and it is standards compliant. Reminder: the AAA games industry produces feats of performance engineering every year on Windows, and aren’t blaming the compiler. Instead, when you say lack of native compilers, you really mean software is developed first for Linux using GCC, and is not written in a cross-platform way.

It is not impossible to build Stan with Visual Studio, I was doing so a few months ago. The real issue is dealing with downstream library packagers who have to deal with RTools and CRAN restrictions.

Microsoft produces their own R distribution. It’s what I would use on Windows if I had to deal with security-conscious IT. I’m not sure what your example is supposed to mean regarding ease of installation - on the company Linux server, mere developers don’t have admin privileges to install system packages either.

2 Likes

But isn’t Visual Studio a $$$ software? Or does MS offer by now an open-source counterpart which is free?

I think you can download some visual studio “free”, but we are now talking about C++ compilers, right?

Yup…and the supposedly good vc compilers are not free to my knowledge.

Visual Studio is free for open-source software. Microsoft even offers free Windows virtual machines for development, the only restriction is the usage license must be renewed every 3 months.

Ok, I just tried out both a virtualbox and a wsl1 approach (ubuntu 18.04 within win10, rstan), on kalman filtering / matrix exponential recursions. the virtualbox approach is similar to native win10, the wsl1 approach is ~ 1.5x faster than native win10. based on average time taken to perform 1 log prob / gradient calculation over a lot of data. No difference comparing rtools 3.5 to 4.0, unfortunately. is the windows port of gcc really so bad? seems impressive…

1 Like

Thanks for the report. Yeah, its likely the g++ port or the libraries.

If RStudio figures a way to connect with WSL without having to do rstudio servers that would be a huge win.

1 Like

It’s very hard to write cross-platform C++ code given that the spec doesn’t tie down edge case behaviors, so all the unit testing in the world won’t help you. We’re trying our best to write cross-platform code. If you find places that can be improved, filing issues to help guide upgrades would be super helpful.

1 Like

I also pondered Window vs Linux for a while. This is really an apple to oranges to bananas to bandanas comparison, since many factors come into play on the same hardware:

  • compiler (Clang) and compiler version (v4.0)
  • libc & libcxx implementation (GCC’s? MSVC?) and version
  • virtualized (WSL2, VBox) or not
  • cache usage by OS
  • price of syscalls & I/O
  • also WSL1 is a Linux ABI over Windows kernel with its own cornucopia of performance weirdness
  • is Windows Defender tracing your Stan model?

etc etc

2 Likes

Uhh… it would be interesting to find out how much CPU time is burned for this in case this really happens…

When I first tried wsl1, defender had to be switched off (or folders excluded) or compiling took forever. Otherwise I don’t remember any differences within win / wsl1, but that was a while ago…

I think RStudio has finally provided the WSL2 guidance folks need: https://support.rstudio.com/hc/en-us/articles/360049776974-Using-RStudio-Server-in-Windows-WSL2

As a simpler option, we’ve recently extended cmdstanr to allow for running cmdstan through WSL from a regular windows R session. Simply add wsl=TRUE to the install_cmdstan() call, and this will result in cmdstan being built under WSL and all subsequent models then being run through WSL.

There is currently a performance issue due to where cmdstan is being stored (background in this post), but I’ll be getting that resolved soon

4 Likes