Stan Playground: Stan without installing Stan

Announcing Stan Playground

Stan Playground is a new open-source, browser-based editor and runtime environment for Stan models. Users can edit, compile, and run models, as well as analyze the results using built-in plots and statistics or custom analysis code in Python or R, all with no local installation required.

I gave a live demo of the site at StanCon 2024 (video forthcoming), and we (myself, @magland, and @JSoules) are happy to now announce it for broader use.
Whether you’re a new user, an educator trying to teach Stan, or an experienced user who just doesn’t have their new laptop configured yet, we hope to make your life just a bit easier.

You can visit the live website here: https://stan-playground.flatironinstitute.org.

screenshot of live site

Feature Overview

For users familiar with tools like the Compiler Explorer, repl.it or JSFiddle, Stan Playground hopes to provide a similar experience for Stan models.

Stan editor

The site features an editor for Stan code with syntax highlighting and provides warnings and errors from the Stan
compiler for instant feedback.
screenshot of the Stan editor

Compiling models

Compilation of the models is the only part of Stan Playground which is not run locally. We provide a public server for convenience, but you can also host your own.

screenshot of the page to select compilation servers

Preparing data

Data can be provided in JSON format in its own editor, or can be generated from code written in R (using webR) or Python (using pyodide), including code that imports published datasets.

screenshot of providing data

Sampling

After a model has been compiled, sampling can be run entirely in your local browser.

screenshot of sampling

Viewing and analyzing results

Stan Playground has several built-in ways of viewing the samples, but also supports performing your own analysis, again in R or Python.

built-in histogram view

custom R anaylsis

Sharing

Stan Playground has built-in sharing features to allow you to download a copy of your project, upload an existing project, or share via a Github Gist. Sharing with a Gist provides a link you can send to other users–when clicked, the link loads your shared project in the recipient’s browser.

You can also prepare custom links if you have files already living at some URL (e.g., they are already in a github repository).
For example, this link will load the “golf” case study from the example models repository:

https://stan-playground.flatironinstitute.org/?title=Knitr%20-%20Golf%20-%20Golf%20Angle&stan=https://raw.githubusercontent.com/stan-dev/example-models/master/knitr/golf/golf_angle.stan&data=https://raw.githubusercontent.com/stan-dev/example-models/master/knitr/golf/golf1.data.json

Dark Mode

Like any good modern website, you can also use it in dark mode:

Limitations and planned features

There are several things Stan Playground can’t do, but might one day! We’re hoping to hear from you all about which of these deserves priority:

It’s also worth noting that the use of web technologies does place some constraints on what is (currently) possible. One a few early users have run into is memory constraints – Web Assembly is currently limited to somewhere between 2 and 4GB of RAM usage, depending on your browser. You can still do a lot of cool modeling within those limits, but it’s still important to be aware of.

17 Likes

This is fantastic!

Agreed! This is what @andrewgelman was asking about for years. I’m curious whether he thinks that now that it’s built (hence the ping).

@WardBrian: I think it would be really cool to couple this with the MCMC monitor code from @JSoules. That would immediately give you all the posterior analysis tools you’d want, even if you don’t hook it up to run online as things are sampling.

1 Like

Viewing the samples as they progress is much trickier than in the case of cmdstan writing out to disk (see discussion in integrate MCMC Monitor · Issue #8 · flatironinstitute/stan-playground · GitHub) but in terms of after-the-fact analysis, we already have most of the tools mcmc-monitor provided. Stan-Playground “builds in” the summary statistics table, histograms, and traceplots. MCMC monitor had these plus plots of autcorrelation and scatter plots.

If these would be useful to build in, we could. Note that you can still get these, and indeed many more types of plots, using the analysis scripting feature and something like bayesplot. The first is bayesplot::mcmc_acf() and the second is mcmc_pairs()

1 Like

Stan Playground looks amazing. I just have a few quick comments:

  1. I altered the model (adding a quadratic term to the regression) and re-compiled and re-ran. It did this just fine. But the compile was kinda slow. It took longer to compile online than on my laptop. If that’s the way it is, ok, fine. I just wanted to let you know in case it was some kind of bug. Playing around, I found that sometimes compilation just takes a second, other times it takes 15 seconds. So I guess it just depends on what else is happening on the server right then.

  2. I like that I can directly edit the data in the data window, because that allows me to easily see how inferences change if you change a data point.

  3. I was playing around with the model and made some errors. The debugger pointed out syntax errors very helpfully!

  4. This made me think it would be a good idea to add Pedantic Mode as an option, as that would catch other problems in people’s code.

  5. (Ok this one is silly:) Can the indentation be user-defined? I’m an indent 2 spaces guy, and it pains me to see the 4-space indentation!

  6. I looove that there’s a Data Generation setup. That’s exactly how it should be. And it’s in both Python and R! How cool is that?? Given that the generated data go into the data.json window, I wonder if it would make sense for that data.json window to be labeled as “data loaded in from outside” or “data generated from the data generation window in Python” or “data generated from the data generation window in R.” That way it would be transparent to see where the data came from.

  7. For the default example, to promote good practice, it might make sense for a preset value be given for the random seed in the Stan generation. Also, in the R code that generates the data, it could make sense for the first line of that code be set.seed(123) or something like that, just again to promote reproducible research. I suppose there’s a similar way to set the seed in Python.

  8. One thing that’s super-cool about Stan is that we can simulate the data in the transformed data block and then fit the model in the parameters and model blocks. Here’s an example I whipped up and checked on my own computer and it ran just fine:

transformed data {
    int N = 1000;
    real a_ = 0.2;
    real b_ = 0.3;
    real s_ = 0.5;
    vector[N] x;
    for (n in 1:N) x[n] = 1.0*n/N;
    vector[N] y;
    for (n in 1:N) y[n] = normal_rng(a_ + b_*x[n], s_);
}
parameters {
    real a;
    real b;
    real<lower=0> s;
}
model {
    y ~ normal(a + b*x, s);
}

When I put it into Stan Playground, the above code compiled but it gave me an error when I tried to run it. Here was the error message:

RuntimeError: Out of bounds memory access (evaluating 'this.m._tinystan_sample(b,a,L,g,i,h,m,l,f.valueOf(),d,Y?1:0,U,k,V,W,j,E,v,y?1:0,z,O,N,R,w,T,P,_,I)')
(see browser console for more details)

I could not find the browser console so I was not able to see the details. I thought the problem might be that the data window was full of data. (This shouldn’t be a problem, because Stan ignores irrelevant data in the data statement, but just in case . . .) So I removed all the data so that data.json was just empty and it gave this error:

TypeError: undefined is not an object (evaluating 'str.length')

I then tried entering just {} into data.json and it gave this error again:

RuntimeError: Out of bounds memory access (evaluating 'this.m._tinystan_sample(b,a,L,g,i,h,m,l,f.valueOf(),d,Y?1:0,U,k,V,W,j,E,v,y?1:0,z,O,N,R,w,T,P,_,I)')
(see browser console for more details)

Again, this seems wrong because this program ran fine on my laptop using cmdstanr.

  1. In my program above, I did this kinda hacky thing of giving underscores after the variable names to denote the true values. Ideally in some beefed-up version of Stan there’s be some kind of operator so that the transformed parameter block would look something like this:
    real true(a) = 0.2;
    real true(b) = 0.3;
    etc.

And then the program could automatically check the coverage of the true parameter values.

  1. If this is publicly released, how do you avoid it swamping your server?

  2. Is there a privacy issue? That is, are you saving all the models that people fit? I assume no, in which case it could make sense to say that in the documentation.

I’m sure I’ll have more questions, but this is a start!

Thanks again for sharing this.

3 Likes

Thanks for giving it a try, @andrewgelman!

I will have to take a look at those errors you were getting — needless to say, they shouldn’t be happening, so we will try to get to the bottom of it. What browser are you using?

Responding piecemeal to your other thoughts:

  1. Compilation times - yes, they will vary, but our hope is it’s usually under ~20 seconds. The reason sometimes it looks almost instant is probably because we do cache models on the server, so if it has already seen that model recently it will serve the cached copy

  2. A pedantic mode option would be easy, I’ll add it to the list of features to explore

  3. I like the idea of marking the data somehow if it was automatically generated. I will look into it

  4. The server is implemented in such a way where it shouldn’t be too easy to swamp, but it could happen. We’re hoping that large scale users (e.g, people running a 100 person class or something) follow the “Run your own” instructions

  5. Privacy: as mentioned earlier, we do cache the models on the compilation server. I will say we really don’t have any easy way as the people running the server to actually go in and look at them, but hopefully it goes without saying that if you have something really proprietary, don’t upload it!

Plus one to that.

There are one-liners for these:

vector[N] x = linspaced_vector(N, 1.0 / N, N);
vector[N] y = normal_rng(a_ + b_ * x, s_);

I don’t think we’re going to be able to automatically check coverage with Stan programs—it’s something you could do in brms. I find the underscores jarring as they’re conventionally used to mark member variables in classes in object-oriented languages.

The compute for sampling is happening on your local machine. The only thing the server’s being used for is compiling Stan to C++, which is pretty efficient and scalable.

We only get the code for the models. The data is local on your machine along with the compute.

Sadly, it doesn’t. We should have a privacy statement here that indicates what gets sent to us (the source code for the Stan model) and what doesn’t (data, sampling, local edits that don’t get compiled, etc.). I think we want to provide minimal privacy guarantees here as we don’t have the cycles to make this super secure.

Brian:

When writing that earlier post, I ran Stan Playground in the Safari browser on my Mac and it had the above problem. Inspired by your comment, I tried it in Chrome. It compiled fine but when I tried Run Sampling, it gave the following error:

Sampling failed!
TypeError: Cannot read properties of undefined (reading 'length')
(see browser console for more details)

In case it helps, here’s the R code I used to run the Stan program (which I saved in the file “sim_test.stan”) l successfully on my laptop:

library("cmdstanr")
set.seed(123)
sim_test <- cmdstan_model("sim_test.stan")
fit <- sim_test$sample(refresh=0)
print(fit)

This is awesome! – Really exciting times for the Stan community

2 Likes

@andrewgelman we fixed a couple issues related to when there was no data in a model. Your model now runs fine for me in Firefox/Chrome, but unfortunately I am not able to try in Safari with the computers I currently have access to

Brian,

It works now. Great!

1 Like

Also one small thing:
In the tabular output I recommend having less space between the lines. This may sound silly but if you have less space, it will be possible to see more lines at once, so that as a user I can get a quicker view of the inferences and convergence information.

My presentation/demo from StanCon 2024 is now available on the Stan YouTube channel. Unfortunately, the demo portion is a bit washed out in the video. It still gives a good overview of what the features are, even if you can’t always see their results super well

I posted a link here so maybe that will get us more users: https://statmodeling.stat.columbia.edu/2024/10/31/stan-playground-run-stan-on-the-web-play-with-your-program-and-data-at-will-and-no-need-to-download-anything-on-your-computer/

Also, I just noticed that the R post-processing script will make graphs too! So I should be able to implement demos such as the golf example.

This brings me to a request for a new feature: would it be possible to have multiple Stan programs in the same project, so that as a user I can fit and compare multiple models (i.e., “workflow”)? An example would be the golf problem, where we fit several different models to two different datasets and we compare the fits graphically.

1 Like

We anticipated this request!. It’s possible from a technical standpoint, but it definitely has some trickier design questions when it comes to what everything is named and how you can control everything in that case.

The next time you visit Flatiron maybe we can find a time to talk about this sort of thing more. In the mean time, I think the next best thing is just preparing multiple links that students could open one-after-the-other, though of course this means you can’t do LOO-esque comparisons as easily

I’m at Flatiron right now! Could talk at 1pm if you’re around?

Ah, I am unfortunately traveling this week! I think at least one of my collaborators is, or I can try to set something up where all of us would be for next time?