My experiment on webstan/cloudstan

rok_cesnovar · July 14, 2019, 1:59pm

I know the idea of a webstan/cloudstan has been brewing around here for quite some time. I was running some performance tests and decided to experiment with cloudstan a bit. I made a live demo that runs on a free AWS instance so its understandably slow.

Underneath it uses httpstan (thanks @ariddell and everyone else working on that, it was a breeze to make a new interface work).

You can check it out here: https://cloudstan-experiment.herokuapp.com/

Logging in is mandatory in order to create and run a model. If you dont want to register, you can use dummy data (the email has to be in valid form, test@test.com is fine, no emails will be sent). Once you login you can create a new model, compile it, supply the data and get the results in table and chart form (some basic charts).

The editor has syntax highlighting, but I havent yet added all the keywords, just some quick ones. The fit summary doesnt include n_eff, Rhat and se_mean as those require a bit more calculation and httpstan doesnt calculate them.

If for some reason the AWS instance would terminate, here are some images of how it looks like.

The charts display 1 parameter by default, other can be added by clicking their name in the charts. For the histogram you can select the parameter from the dropdown.

Would something like this be of any use to anyone? I imagine it could be useful for beginners. Any comments or discussion is welcome.

The main issue is of course the computing resources, as the free instance will probably be limiting.
EDIT: I havent tested it on mobile, not sure it works there.

ahartikainen · July 14, 2019, 2:14pm

Hi, quick way to get ess and rhat could be done by wrapping arviz?

breckbaldwin · July 17, 2019, 4:17pm

Rok,
That is really nice and above all simple. Sorry to immediately request a feature but access to the errors would help debug models. My model didn’t compile but works locally but not worth worrying about.

I am teaching/developing Stan for Programmers, this is a very nice interface that may work better than RStudio. Issue for classes is scaling to more instances as 40 people compile their own Stan program at the same time.

Do you need resources to keep moving this along? SGB has funds.

Breck

rok_cesnovar · July 17, 2019, 7:02pm

Feature request, ideas & comments are the reason why I put this up here. So thank you for the feedback. I threw thi

Model compilation errors are displayed in a red colored “flash message” similar to the “Model is compiling” box. There is a generic error displayed if the server experienced any hiccups. That was the case here. I was logging too much stuff and ran out of space. Its fixed now.

I did add a check for the input data. The only supported format currently is JSON, which is something I probably should list somewhere.

The caching feature that httpstan provides would shine here. I am guessing that there would be at least some matches out of the 40 models (or it would match the models from people that took the same class before them).

Your model probably compiled the last time, the server was just unable to respond. If you try compiling it again it should be instant. If anyone using this web app would compile the same model the compilations would be instant for them as well.

Not really, thank you. If there is interest I will definitely move this along faster. My current idea is to try and support most visualization that are provided by bayesplot.

breckbaldwin · July 17, 2019, 9:55pm

Rok,
Is there a repo where I can generate issues instead of making a mess here on discourse? Again thanks.

Breck

rok_cesnovar · July 18, 2019, 5:37am

Yes, I setup a repo on github: https://github.com/rok-cesnovar/cloudstan-experiment

Thanks.

andrewgelman · August 13, 2019, 12:08pm

Hi–I tried Cloudstan and I have some issues / feature requests:

Could there be an interface to R? That would be convenient. As it is, I needed to first work in R on my desktop, then convert the data to JSON format, then copy-and-paste into the Data input window, and then it didn’t work! (see info below)
It seems to just run 1 chain, which will create problems for practical use. Can it be set to run 4 chains and then do convergence monitoring?

If this could all work, it would be huge, I think!

Stan program:

data {
int N;
vector[N] y;
vector[N] x;
}
parameters {
real a;
real b;
real<lower=0> sigma;
}
model {
y ~ normal(a + b*x, sigma);
}

–

JSON file:

{“x”:[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100],“y”:[9.1269,-47.4086,57.868,-40.9684,20.4094,3.9589,-36.5922,39.8938,-43.7822,10.5357,39.8121,55.5456,43.1374,51.8758,50.6403,120.7135,47.2974,144.8674,144.6857,69.5459,39.6452,103.3713,104.9762,46.3694,23.1451,101.9823,79.3229,102.4158,12.6294,112.5222,131.2641,82.7752,99.5206,49.2659,195.2491,158.1527,124.5499,147.5908,52.3644,213.7,38.4534,89.967,185.4385,146.365,157.424,171.4865,125.7648,145.9285,214.7259,164.4298,99.7023,61.9154,132.5432,174.5351,193.5842,191.5415,248.8224,124.6395,124.9634,155.6249,207.237,172.8277,149.2687,182.4149,181.9576,182.6331,196.8433,205.2649,292.506,252.3278,236.9424,202.5592,284.0804,248.8951,221.6069,139.9184,256.2312,309.3456,284.7533,198.8171,305.7594,200.5142,215.9679,217.356,180.8665,203.5907,246.2932,275.3513,221.6917,288.6411,271.8738,288.1322,208.0077,395.8834,223.6855,330.6136,339.3635,275.6744,256.7296,361.3982],“N”:[100]}

Error message:

Sampling started…ERROR:
Error calling services function: Exception: mismatch in number dimensions declared and found in context; processing stage=data initialization; variable name=N; dims declared=(); dims found=(1) (in 'unknown file name' at line 2)
undefined

rok_cesnovar · August 13, 2019, 12:33pm

Yes. I am working on optionally supporting using Cmdstan as a backend, which would mean you could use all inputs that are supported by Cmdstan.

Same as above, by using Cmdstan that should be easily achievable. I am not sure if httpstan, that I use for the Stan backend right now, supports multiple chains.

The issue is that N should not be defined as an array, so just using writing "N": 100 instead of "N": [100] should work.

andrewgelman · August 13, 2019, 12:46pm

Thanks! Mitzi is developing CmdStanR (or maybe I’m getting the name wrong; it’s the R interface to CmdStan) so maybe that will work here.

Regarding the JSON problem, I used the toJSON() function to convert my R list to that JSON file. I wonder if Stan’s method of reading JSON files needs to be fixed in some way to account for this problem.

P.S. Having an interface to R would help because then we could also make graphs etc.

jjramsey · August 13, 2019, 12:50pm

If when using the toJSON() or write_json() functions from the R package jsonlite to write a list to a JSON file, one should pass auto_unbox = TRUE to those functions. That way, component N from that list will be written to the JSON file as "N": 100 instead of "N": [100].

Bob_Carpenter · August 20, 2019, 7:54pm

Is this using the same JSON data schema as CmdStanPy? I understand there are issues with parsing given differences between R and Python standard package output.

Everyone else continued to respond here, so I’m piling on. It may be premature to start filing fine-grained issues. GitHub’s a terrible place for discussion.

I think having an R interface (or Python interface or shell interface or anything else lower level) makes this fundamentally different. It then becomes a tool for people who know those languages.

Is there a reason RStudio server doesn’t do what you want, @andrewgelman? If so, we might want to address that rather than trying to build two interfaces that do the same thing (provide R access to Stan).

@rok_cesnovar — who’s the imagined user for the thing you’re building?

@andrewgelman and @breckbaldwin — same question.

Yikes. That’s pretty restrictive if it doesn’t. CmdStanPy lets you do this, but I don’t know how easy that or CmdStan would be to use in a web interface. It’s all tricky with synching up returns from asynchronous processes.

ahartikainen · August 20, 2019, 8:41pm

It does support multiple chains, see parallel test. This will use 1 process, multiple threads.

github.com

stan-dev/httpstan/blob/master/tests/test_bernoulli.py

"""Test sampling from Bernoulli model."""
import asyncio

import requests

import helpers

program_code = """
    data {
        int<lower=0> N;
        int<lower=0,upper=1> y[N];
    }
    parameters {
        real<lower=0,upper=1> theta;
    }
    model {
        theta ~ beta(1,1);
        for (n in 1:N)
        y[n] ~ bernoulli(theta);
    }

This file has been truncated. show original

CmdStanPy / CmdStan do not support multiple chains (in same process). Or they do, just in different processes.

andrewgelman · August 20, 2019, 9:09pm

Bob:

I’d like at least a rudimentary R interface because otherwise I don’t know how to call Stan or pass in data. Cutting and pasting a JSON file is kind of a mess. I guess if there is some clean way of getting the data in, it could work.

Regarding the RStudio-on-cloud thing: That was great, but one thing I don’t like about it is that it is a dependence on RStudio: What they give, they can take away. And indeed I don’t think it really works anymore, at least I recall seeing that on one of the discussions.

If we want cloudstan to not use R or Python, that makes sense; then we just need a way for R or Python users to be able to send data to cloudstan and to take the output from cloudstan and send it back to R or Python.

rok_cesnovar · August 26, 2019, 10:52am

The starting point is to target novice users and workshops attendees. If it proves to be useful there, we can build on that.

I was only in charge of building the toy example prototype, since then Erik has taken over and completely revised the frontend and got some frontend people to work on it. The plan is to have a more polished version in late september. I am mostly just thinking/researching on how to solve the cloud orchestration/Stan backend stuff in the meantime.

Great. Thank you for the link. I couldnt find docs or an example on how to specify this in a http request data so I did not use it for the time being. I obviously wasnt thorough enouh.

Bob_Carpenter · August 27, 2019, 1:23am

And indeed I don’t think it really works anymore, at least I recall seeing that on one of the discussions.

I don’t know the current status, but long term the problem’s going to be paying for cycles.

If you want to pay for the RStudio thing, it hasn’t gone anywhere.

What they give, they can take away.

Not if we host it ourselves.

I’m still very confused about what you want. If not RStudio, then just an R terminal on the web with graphics? I don’t think we want to be in the business of building our own RStudio.

andrewgelman · August 28, 2019, 12:45am

Ben Goodrich would know the current status, but my impression was that the Rstudio cloud thing wasn’t currently working, so we can’t really use it for demonstrations. This new thing is working, so that’s good, but I’d like to see some easier way to load in data than Json.

Bob_Carpenter · August 29, 2019, 3:26pm

It still works for me:

Topic		Replies	Views
Stan now (mostly) working on RStudio Cloud General	58	7929	March 30, 2020
Stan Playground: Stan without installing Stan Publicity education , website	23	1606	March 24, 2025
Compiling CmdStan to https://webassembly.org/ - how to make "one large" C++ file for a model, a C++ file that contains "everything needed", including Stan and Math routines General	32	2509	November 16, 2019
Various observations on rstan, cmdstanr, pystan and cmdstanpy after teaching with all in parallel Interfaces pystan , rstan , cmdstanr , cmdstanpy	26	3544	October 12, 2021
"Wiki" or "Facebook" for Stan models; building the Stan community Developers	29	2647	November 19, 2016

My experiment on webstan/cloudstan

Related topics