Stan on the cloud

Is there any effort to allow users to run Stan on the cloud with minimal setup? I use Stan relatively infrequently and when I do it always seems to take quite a long time to figure out all the installation issues. I would much prefer to just run it in the cloud even if it costs some nominal amount.

I recently came across this Stan now (mostly) working on RStudio Cloud . This would be awesome but unfortunately it seems like it no longer works due to Stan’s increased RAM requirements. (And, from what I could tell, it doesn’t look like there is some sort of paid option which would allow users to increase their RAM.)

If there is some ongoing effort to create this that would be awesome. If not, I may look into the various options myself.

1 Like

There is an ongoing effort by our team in Ljubljana. We were planning to have an alpha release in October, but things got delayed due to various reasons, but its now close to being ready, hopefully late January or early February.

2 Likes

Great to hear! This would be a fantastic resource for the community!

Hi,

I have an AWS account on Amazon. There I have an image with RStudio etc. installed. Whenever I need extra umpf, I pick the number of cores/mem, start the image, and run the model. When it’s done I turn off my virtual machine until I need it again. It doesn’t cost that much really and for 2 years I’ve done it this way.

Recently my uni set something like this up for employees so now I have the same thing for free basically (not as easy to set up as AWS of course…)

4 Likes

I put together an IR jupyter notebook for use on Google Colab here:

Google’s Colab gives you a VM - here’s someone’s notebook that you can run to see what you get: https://colab.research.google.com/drive/151805XTDg--dgHb3-AXJCpnWaqRhop_2
the VM goes away after 12 hours. so for the price of a Google account (free), you get a cloud VM that is useful for demos, teaching, etc.

Colab can run IR jupyter notebooks - I started from here: R Jupyter Notebook + RStan on Google Colab

to get minimal install, what I did was create a “bootstrap” IPython notebook on Colab, where I used package rpy2.ipython - this let me install a bunch of R packages to a local directory on the VM - here’s the ipython notebook code block:

%%R
# LIBRARY CONFIGURATION
install.packages('StanHeaders', lib='RStanLibs')
install.packages('bayesplot', lib='RStanLibs')
install.packages('rstan', lib='RStanLibs')

then I tar-gzip’d everything up:

!tar cf - RStanLibs | gzip > RStanLibs.tgz

and then put that onto a Google Cloud Services bucket. as long as the Google Colab notebooks are all running the same VM, this minimizes the install process - in the IR notebook, the first code block downloads the pre-compiled RStan libraries:

# Install pre-compiled R packages for StanHeaders, bayesplot, and rstan
if (!file.exists("RStanLibs.tgz")) {
  system("wget https://storage.googleapis.com/rlibs-rstan-plus/RStanLibs.tgz", intern=T)
  system("tar zxf RStanLibs.tgz", intern=T)
  system("mv RStanLibs/* /usr/lib/R/site-library")
}

it’s a hack, but it works.

next up, similar demo for CmdStanPy on Colab.

3 Likes

Torkar – Thanks for the heads up! This is really helpful. Just took a quick look and was able to get the AMI up and running. Will definitely go back to this if I need more computing power.

Mitzimorris – Thanks a lot for the tip! I confess that I was a bit confused with how to use the google collab notebook. I made a copy of the collab notebook that you linked to and executed some of the cells but it seemed to take a while and I didn’t really follow the code. There was also a link to a Kaggle notebook with R and Rstan which seemed easier to follow so am going to test that one out.

Hi Mitzi.

I created an example how to run PyStan(+ArviZ&Bokeh) on MyBinder.


https://mybinder.readthedocs.io/en/latest/using.html#preparing-a-repository-for-binder

There is also option to use R so that could also work without too much of hacking.

I needed to use -flto to reduce memory use while compiling.

Go and click launch:

There might be a way to use -flto always with gcc/g++ but not currently sure how that could be done.

There are also some ways to optimize the starting time.

3 Likes

Doug - google Colab is super-confusing - what exactly did you find confusing?

w/r/t “Stan in the cloud” - I’m specifically talking about “jupyter notebooks in the cloud which can be used to run Stan” - Kaggle / Binder / Colab / RStudio / etc.

in Colab:

  • the cloud instance you’re running is some kind of hardware running Ubuntu
  • Colab’s Python3 notebooks already have most Python libraries needed for ML/DS already installed, including PyStan - (but not CmdStanPy)
  • Colab’s dashboard lets you create new Python notebooks but not R notebooks, however, you can create an IR jupyter notebook and then upload it to your GDrive and then run it in Colab, which is what I did to create the golf putting example notebook (following directions in above-cited blogpost)
  • the IR jupyter notebook doesn’t have the RStan et al libraries installed, therefore whenever you spin up a cloud instance, you’re going to need to install these libraries somehow. downloading and installing from CRAN takes a long time, which is why I put together this Colab hack.

in order to run the Colab golf putting notebook via Colab, you have to have a Google account, sign-in to your google accout.

you will then get a scary warning saying that this notebook isn’t created by google and therefore might access your gdrive and steal all of your personal information and probably transfer all of your assets to some Cayman Islands bank account and then run up massive charges on your credit card. first off, this can only be done in Python, using the Colab python api which allows you to mount your gdrive on the cloud instance. secondly, in order to mount your gdrive, it will ask for authorization.

therefore this warning is appropriate, but I assure you, the golf putting example is only going to run one of Andrew’s favorite demos. if you do it in Colab, the spin-up / install RStan time is minimal. that’s all.

2 Likes

Mitzi – Thanks a lot for the follow up and sorry for the unhelpful earlier message. I think I follow what you have done now. So, basically, I should first create an IR notebook in google collab and then run the code below to quickly get a notebook with R and RStan (and other commonly used packages) running? That seems great and works for me.

# Install pre-compiled R packages for StanHeaders, bayesplot, and rstan
if (!file.exists("RStanLibs.tgz")) {
  system("wget https://storage.googleapis.com/rlibs-rstan-plus/RStanLibs.tgz", intern=T)
  system("tar zxf RStanLibs.tgz", intern=T)
  system("mv RStanLibs/* /usr/lib/R/site-library")
}

this is the problem - you cannot create an IR notebook directly in Colab - but you can upload one, and once uploaded, you can edit and change as much as you like in Colab.

so install IR kernal locally, then create a blank IR notebook, then upload to Colab - and then you’re good. https://irkernel.github.io/installation/

the URL in the example may go away - that was a test I put together for playing around with the Google Cloud Services platform. what you’re really want to do is run your own bootstrap process and put your tgz of installed libraries up on the web - github gist or similar - any public URL.

does this make sense?

1 Like

Hi Everyone – Thanks a lot for the advice and suggestions! In case it is useful, I have created a page which summarizes the various ways you can run RStan in the cloud here https://github.com/dougj892/ie4dfes

5 Likes

Cool!

https://cocalc.com/

1 Like

Good instructions on getting rstudio set up on aws here

And for Google Compute Engine, there’s googleComputeEngineR

For aws, the most compute-focused instance types would be the c5 family. for gce, it’s c2.

1 Like