Apple's new M1 processors and Stan

Hey all,

I just wanted to ask if anybody has experience already with Apple’s new M1 processors (Apple M1 Chip - Apple) and how they get along with Stan?

1 Like

Hey Paul!

No experience, just linking the following:

Cheers,
Max

1 Like

thanks! I should learn to use the search function better :-D

1 Like

One more small benchmark here: Detect the use of the M1 ARM-based CPU and suggest adding CXX · Issue #365 · stan-dev/cmdstanr · GitHub

In general, ARM processors are super nice to Stan, more so with parallelization (using reduce_sum). I have more recent experience with Linux ARM but there is the same story here. On Linux ARM we have seen models run from 50% to up to 3x faster (same number of cores).

I think its mostly due to better use of caches.

4 Likes

It’s funny that the intel tbb makes things fly on arm!

2 Likes

When can we expect to be able to install Rstan natively on M1 Macs? Will this coincide with the native build of R 4.1?

That is correct.

Only option for native compiling of Stan in R until then is CmdStanR.

ok, thanks.

There’s really no way to get Rstan? I’m running R dev for aarch64, and it works fine so far.

There is an experimental version of R for native use on m1: https://mac.r-project.org/

With that you could build rstan from source natively. But I have no idea how safe that is and if it works for rstan.

Edit: oh i guess you meant this with rdev on aarch64.

yes, that’s the R build I’m using. I’ll try installing Rstan

I’ve installed Rstan, Rstanarm, brms. So far, Stan (via brms) seems to compile and sample about 3x faster on my M1 Macbook Air than on my work laptop (2 year old 13" Macbook Pro, quad-core i7).

2 Likes

Cool! How many cores does parallel::detectCores() return for you?

parallel::detectCores()
[1] 8

I wonder how the scheduler handles the heterogeneity of there being 4 performance cores and 4 efficiency cores. Have you tried parallel chains?

yes, I get the impression that when running 4 chains in parallel, I’m running on 4 cores, but when I ran 6 chains, I got 4 cores, and then the 2 remaining chains were run afterwards, but I’ve only been playing around with this for half an hour.

It’s been hard to tell, tbh, because sampling is so incredibly fast.

2 Likes

I have a new M1 Mac mini. If someone provide me with a python based test code I can volunteer to run it.

3 Likes

What might be interesting for some, when using CmdStan on M1:

  1. Compilation is very fast - no idea why, but much faster than on i9
  2. Last update of Big Sur has broke installation of CmdStan ( “Dyld: Library not Loaded” errors when running from python, and " error: half args and returns was disabled in PCH file but is currently enabled, error: PCH file was compiled for the target ‘x86_64-apple-macosx11.0.0’ but the
    current translation unit is being compiled for target if running from terminal"). Reinstalling CmdStanPy and CmdStan solved the issue.
2 Likes

Anyone tried Stan on the new Mac M1 Pro or M1 Max chips yet ?

Intriguing looking results from some Python test runs here:

2 Likes

A bit late to the game, but I compiled and installed httpstan and pystan on Mac M1 Max. It is remarkably faster than anything I have tried before on non-arm architecture.
I don’t have a benchmark though.

1 Like

Thanks for following up @tinosai.

You might want to try CmdStanPy. It’s lighter weight than httpstan and should be faster. But it doesn’t include log density or gradient calculations if you need those. I’d be curious what the speed comparison looked like on an M1.

But what I really want to know is how the M2 MacBook Pros perform. I’m about to get a new notebook through work and that seems like the obvious choice.

1 Like