Hello…I hope this great community can kindly share any advice, tips and personal experiences with me.
I have been using an iMac + Stan + RStan + RStudio, and have been delighted with how fantastic it all works…brilliant job developers! The models I fit are non-linear mixed effect models with perhaps 50-500 parameters, 4 chains/cores. Great MCMC sampling in about 5-20 minutes, that I review/check using my own R code in RStudio. For some intensive tasks (like LOO CV) the above 5-20 minutes is not ideal, especially if you wish to compare LOO CV performance across models.
Now my iMac is 7 years old, I would like to buy a new computer that can run the above models/tools/methods as well as possible. Any advice on the following would be really helpful and much appreciated, as I am really clueless hardware wise.
Mac v Windows?
I have had problems installing Stan/RStan on a Windows machine, but perhaps was unlucky. Mac worked perfectly. Do other users/developers have a preference here? Has anyone any experience with the new Mac Studio?..are you happy?
RStan v CmdStanR?
I have found RStan perfect for my needs, but understand it seems less easy to update/maintain than CmdStanR (although I may be wrong). In terms of future proofing my workflow, would people recommend I switch to using CmdStanR over RStan? I am aware of this comparison (below), but was wondering if any users/developers have recommendations for me going forward.
Any comments/thoughts would be much appreciated, Al
Regarding Mac vs Windows, CmdStanR installs pretty easily on Windows - if you have good reasons for getting a Windows box, CmdStanR should work for you. So answer to Q 1 is “either” (but almost all of the Stan devs work on Macs or Linux boxes, except for the gamers).
In terms of RStan vs CmdStanR and future-proofing your workflow, CmdStanR always runs with the latest version of Stan - now at 2.29.2. RStan is at version 2.26. So answer to Q 2 is “CmdStanR”.
Have you considered installing a Linux OS and then run R and CmdStanR?
thanks for your reply. Interesting to hear that most developers use Mac/Linux…the type of answer I was interested in knowing (i.e. if most “power” users uses Mac, then information/fixes for this OS will be fixed asap. Thanks!
No…I am familiar with Mac and Windows, but not Linux. Not sure what Linux would offer (easy of installation?), but would not expect major speed difference. Thanks for your thoughts here though.
I think with Linux, it’s easier to install and manipulate matrix libraries and thus have a faster computation.
Thanks Sonicking. Perhaps if you have any link or blog where someone has compared Linux to Windows or Mac, that could help me (and others). At the moment, I am thinking both have c code “running” Stan, so wouldn’t expect major differences, but I am out of my depth here. cheers Al
It should work for Mac too, but I don’t know how well compared to Linux:
From my personal use - I find rstan a bit easier to use, but CmdStanR supports more advanced functionality like parallelizing and profiling. rstan keeps samples in memory and CmdStanR writes them to file, then reads them back into memory as needed. I started as a happy rstan user but have moved to CmdStanR as I learn how to better use Stan.
If it helps - I recently switched to one of the new M1 Pro MacBooks (from a Dell XPS running Fedora). My switch was mostly precipitated by lack of access to software, but I’ve had no trouble getting both Rstan and CmdStanR up and running, and installing R through Homebrew has made everything a breeze. On the whole I’ve found the transition to arm64 pretty painless, and there are a lot less pain points than there were when I first looked into it.
Thanks Tom…yes, my interest in continuing to use RStan was how easy it was to use, and how simple the workflow was (e.g. an R program that read in the raw data, made it “Stan ready” (e.g. with character coding to numeric coding), then define initial estimates, then call Stan with list of arguments supplied, then post process. I haven’t used CmdStanR, but can’t think how it could be easier! I am not sure exactly what you mean by “parallelising and profiling”…that is, if I have 4 chains on 4 CPUs, is there any speed up with CmdStanR? No worries if you haven’t got time to reply…your first note was appreciated.
Thanks for this. Very nice to hear the set up went smoothly with the M1 Pro. Yes, my concern with getting the Mac Studio was the lack of users I have seen posting on it (with regards to Stan). I expect any problems would probably be resolved, but wouldn’t really like to spend the money on a “turkey” that cannot run R and Stan seamlessly because I was too lazy to ask beforehand!
In most Stan models, you compute the log likelihood of the data by looping through each datapoint. Each of the chains does this independently - so like you say, the 4 chains can be run in parallel on 4 cores. However, this loop can also be parallelized – if you have 100 datapoints, you could have 10 cores each compute the log likelihood of 10 datapoints, potentially speeding up model fitting by 10x. This would require 4x10=40 cores, but you could adjust the parallelization to make use of however many cores your machine has.
Stan has two ways to do this that I know of:
The second is a newer method that’s a bit easier to use.
Profiling is a general method of assessing which part of your code is slow and taking lots of time. Here’s a description of how it works in CmdStanR:
Thanks Tom for this superb extra information. I had heard of “reduce_sum” as a way to speed up things, and can now understand why (and hence why CmdStan > RStan on this point). The profiling info is also interesting to see…could be very useful for large models to diagnose where gains could be made. Very much appreciated. Al
I’ve been using cmdstanr on an M1 Mac for several months. It works very smoothly and is very fast in my experience. I would definitely recommend this over an Intel machine or a Windows machine, both of which were noticeably slower despite similar other specs.
@bwiernik thanks for adding your two cents here - did you previously have a windows/intel with similar specs as the M1 mac or are you just making this comparison intuitively from how fast your current mac seems to go? I ask because I have been thinking of switching from my current setup on windows to a Macbook pro with the M1 chip and would like to know of people who’ve really been able to make direct comparisons between that and a windows/intel machine. For me the biggest stumbling block has just been how long sampling for complex models, or for models where there are lots of datapoints, can be on my machine.
Bwiernik / JimBob, Sorry for the late reply…i have been on holiday. Thanks Bwiernik for sharing that the M1 Mac is working better than your Intel/Windows machine…just the info I wanted to hear to “confirm” sticking with Mac for Stan was sensible. Thanks!