Stan now (mostly) working on RStudio Cloud

Thanks to the RStudio developers, I have now been able to get RStan to work on RStudio Cloud. The default C++ compiler is g++, which consumes too much RAM when compiling a Stan program to fit into the 1GB limit imposed on users of this (free) service. However, that can be changed to clang++, in which case all but the most complicated Stan programs that use a lot of matrix algebra can be compiled and executed (unless you try to store tons of parameter draws). It is recommended that you go to

https://rstudio.cloud/project/56157

to get the proper configuration, and then make a permanent copy if you want to do your own Stan work on RStudio Cloud. RStudio Cloud is particularly useful for Stan tutorials where a lot of time can otherwise be wasted getting C++ toolchains installed and configured on everyone’s laptops.

Also, could people blog / Tweet this to a wider audience?

14 Likes

Hi, I tried this out and it’s great, really just what I was looking for all along! It worked right away for me, and I’d like to blog it.
I just have a few questions; see below.
See you
Andrew

  1. Is there some limit on how big the model is, or how long it takes to run, or how many people can fit models using this server? If I blog it and 100 people try it out during the following day, will it get shut down?

  2. For the edit window, when I open New File, I can open an R script, but there’s no option to open a Stan program. Does that feature exist? Or will it be added soon? This would help, I think.

  3. Is there a way to link to a pre-loaded Rstudio Cloud environment that already has the R and Stan code? That would be an even cleaner way to start a demo: user clicks on it and sees the code and then can then alter it and run the new version.

  4. Is there a way to upload a data file also? The example I set up uses fake data (see below), but what if I had a .csv data or whatever?

  5. I wrote a simple regression; I’m including my code below.

File simple_regression.Stan:

data {
int N;
vector[N] x;
vector[N] y;
}
parameters {
real a;
real b;
real<lower=0> sigma;
}
model {
y ~ normal(a + b*x, sigma);
}

File simple_regression.R:

N <- 100
x <- rnorm(N)
a <- 2
b <- 3
sigma <- 5
y <- a + b*x + rnorm(N, 0, sigma)
simple_data <- list(x=x, y=y, sigma=sigma)

library(“rstan”)
fit <- stan(“simple_regression.stan”, data=simple_data)
print(fit)
sims <- extract(fit)
n_sims <- length(sims$a)

plot(x, y)
curve(a + b*x, add=TRUE, col=“blue”)
for (s in sample(n_sims, 10)) {
curve(sims$a[s] + sims$b[s]*x, lwd=0.5, add=TRUE, col=“red”)
}

My R code is kinda clunky–in particular, I don’t like what I had to do to extract n_sims. Any suggestions of how to clean it?

1 Like
  1. I think it is going to be okay. Certainly RStudio Cloud can handle a huge number of simultaneous web connections. If several people were trying to compile a Stan model at literally the same time, they might run out of RAM whereas it would succeed if they were the only one trying to compile.
  2. That is implemented in the next version of RStudio that they are about to release but I don’t know when it will get rolled out into RStudio Cloud.
  3. I added your simple regression example, so I think it will work if you go to Posit Cloud again.
  4. Use the Upload icon in the RStudio Files section at the bottom right of the window.
2 Likes

Perfect. I will blog.

Just to follow up on question 1 above: Will Rstudio give users unlimited cycles? How does this work?

1 Like

The Limits section of the RStudio Cloud Guide only says

Each project is allocated 1GB of RAM.
Each account is allocated one private space, with up to 3 members and 5 projects. You can submit a request to the RStudio Cloud team for more capacity if you hit one of these space limits, and we will do our best accomodate you. If you are using a Professional shinyapps.io account, you will not encounter these space limits.

So, I guess you could use it for long-running MCMC in Stan, but with a giant Stan model, you could run into the RAM limit even if there is no time limit.

1 Like

OK, I wrote an enthusiastic blog post which should appear Monday morning.

Just one quick question: The Rstudio session that we set up: Will anybody be able to access it (as long as they log in)? Or will it only work for you and me or only “up to 3 members”? I wouldn’t want to blog this and then learn that nobody can access the page! Thx.

When someone goes to https://rstudio.cloud/project/56157 , they get a temporary copy of the “project” (R packages + compiler configuration + your simple_regression example files). They can click on the thing that says “Make a permanent copy” in which case the copy becomes their project and they can write to their copy of the directory, collaborate with 2 other people, etc. But it does burn one of their five projects.

This is great. Here’s a thought. Would we be able to set up lots of these pre-loaded Rstudio sessions, one for each of several examples (the golf example, the radon example, etc)? Or, I guess the alternative would be to have one session with lots of examples and then the user could run whichever was desired. But there seems something appealing, somehow, about one session per example. I feel like this could be a great way for people to get introduced to Stan. To me, this would be more accessible than knitr documents.

Just to be clear: I’m not suggesting that these Rstudio sessions replace our case studies. I’m just thinking these sessions might be a good teaching tool and a good learning tool because users can directly play around with the R and Stan code.

It is going to have to be one project with whatever examples we want to put on it. There are about 490 so far.

OK, maybe I’ll add a few of my favorites to it, then. I guess we could talk with Rstudio about getting their permission to have a bunch of these.

If you have stuff you want to add for everyone else, then send them to me so I can add it to https://rstudio.cloud/project/56157 , which everyone else should be using as a starting point. You adding them to your permanent copy of that project only makes them available to you, unless you make your project available to everyone. It is technically possible for anyone to create project 56158, 56159, \dots, 56157 + N but then users of RStudio Cloud would have to burn N of their five projects to see up to five (sets of) examples.

ok!

You should check with RStudio before investing too much time into it, unless this takes no time at all. The reason is that it is currently an alpha product, which means they still have to figure out pricing, how people use it and so on… they may even pull it, depending on what they see. RStudio is awesome and I really like what they are doing; I am only suggesting checking with them re long term plans.

Eric,
Thanks for pointing this out. We’ve been in contact with the Rstudio people specifically about this product, so they do know we’re interested in it!

1 Like

You can have both. You can run both R with markdown and Rmd notebooks in RStudio cloud. Users could open some R script (with markdown) or Rmd, upload their own data, and knit a html or PDF document with nicely formatted output and figures. And then play around with the R and Stan code to make a better model or figures for their problem.

Here’s the markdown enhanced version of your simple example. Now if user press Ctrl-Shift-K, they will get the nice output in viewer (it may require Ben or user to change the R Markdown option “Show output preview in” to “Preview Pane”).

#' ---
#' title: "Simple example"
#' author: ""
#' date: "`r format(Sys.Date())`"
#' ---

#' # Setup
#+ message=FALSE
library("rstan")

#' # Generate data
N <- 100
x <- sort(rnorm(N))
a <- 2
b <- 3
sigma <- 5
y <- a + b * x + rnorm(N, 0, sigma)
simple_data <- list(N = N, x = x, y = y, sigma = sigma)

#' # Draw from posterior distribution
#+ results='hide'
fit <- stan("simple_regression.stan", data = simple_data)

#' ## Posterior summary and convergence diagnostics
print(fit, digits = 2)

#' ## Plot the conditional mean function
sims <- as.data.frame(fit)
plot(x, y, las = 1, pch = 20)
curve(a + b * x, add = TRUE, col = "blue")
for (s in sample(nrow(fit), 10)) {
  with(sims, curve(a[s] + b[s] * x, lwd = 0.5, add = TRUE, col = "red"))
}
1 Like

Aki:

Thanks for sending. I’m not sure what we should do . . . On one hand, this code is clean and it’s great to be able to have the nice output. On the other hand, the #’ and #+ things could be confusing to a user who’s not an R expert and who might not realize that one can just ignore the comments.

I guess one option would be to include the .R script and also include the markdown code that you just included. Would it be a .Rmd file?

That R file can be automatically converted to a Rmd file. Some people might prefer Rmd, as then RStudio offers extra features like possibility to inline preview of figures also in Rmd (only during one session). Here’s the corresponding Rmd (you can open R and then Rmd and compare how RStudio shows them). Ctrl-Shit-K will again make a pretty document.

---
title: "Simple example"
author: ""
date: "`r format(Sys.Date())`"
---

# Setup

```{r message=FALSE}
library("rstan")
```

# Generate data

```{r }
N <- 100
x <- sort(rnorm(N))
a <- 2
b <- 3
sigma <- 5
y <- a + b * x + rnorm(N, 0, sigma)
simple_data <- list(N = N, x = x, y = y, sigma = sigma)
```

# Draw from posterior distribution

```{r results='hide'}
fit <- stan("simple_regression.stan", data = simple_data)
```

## Posterior summary and convergence diagnostics

```{r }
print(fit, digits = 2)
```

## Plot the conditional mean function

```{r }
sims <- as.data.frame(fit)
plot(x, y, las = 1, pch = 20)
curve(a + b * x, add = TRUE, col = "blue")
for (s in sample(nrow(fit), 10)) {
  with(sims, curve(a[s] + b[s] * x, lwd = 0.5, add = TRUE, col = "red"))
}
```
1 Like

Hmmm . . . I don’t like the .Rmd so much as it could just add confusion. Maybe for now best to just stick with the .R file?

Kids today generally do not write .R files; they just put R code in chunks of a RMarkdown file.

4 Likes

OK, maybe we do both, then? I’d like to reach the kids who know RMarkdown and the kids who don’t.

In particular, I’d like to reach the kids who don’t know R or Stan at all. I thought the .R script might be less intimidating to them, but maybe I’m wrong?