My experiment on webstan/cloudstan

I know the idea of a webstan/cloudstan has been brewing around here for quite some time. I was running some performance tests and decided to experiment with cloudstan a bit. I made a live demo that runs on a free AWS instance so its understandably slow.

Underneath it uses httpstan (thanks @ariddell and everyone else working on that, it was a breeze to make a new interface work).

You can check it out here: https://cloudstan-experiment.herokuapp.com/

Logging in is mandatory in order to create and run a model. If you dont want to register, you can use dummy data (the email has to be in valid form, test@test.com is fine, no emails will be sent). Once you login you can create a new model, compile it, supply the data and get the results in table and chart form (some basic charts).

The editor has syntax highlighting, but I havent yet added all the keywords, just some quick ones. The fit summary doesnt include n_eff, Rhat and se_mean as those require a bit more calculation and httpstan doesnt calculate them.

If for some reason the AWS instance would terminate, here are some images of how it looks like.


The charts display 1 parameter by default, other can be added by clicking their name in the charts. For the histogram you can select the parameter from the dropdown.

Would something like this be of any use to anyone? I imagine it could be useful for beginners. Any comments or discussion is welcome.

The main issue is of course the computing resources, as the free instance will probably be limiting.
EDIT: I havent tested it on mobile, not sure it works there.

2 Likes

Hi, quick way to get ess and rhat could be done by wrapping arviz?

1 Like

Rok,
That is really nice and above all simple. Sorry to immediately request a feature but access to the errors would help debug models. My model didn’t compile but works locally but not worth worrying about.

I am teaching/developing Stan for Programmers, this is a very nice interface that may work better than RStudio. Issue for classes is scaling to more instances as 40 people compile their own Stan program at the same time.

Do you need resources to keep moving this along? SGB has funds.

Breck

Feature request, ideas & comments are the reason why I put this up here. So thank you for the feedback. I threw thi

Model compilation errors are displayed in a red colored “flash message” similar to the “Model is compiling” box. There is a generic error displayed if the server experienced any hiccups. That was the case here. I was logging too much stuff and ran out of space. Its fixed now.

I did add a check for the input data. The only supported format currently is JSON, which is something I probably should list somewhere.

The caching feature that httpstan provides would shine here. I am guessing that there would be at least some matches out of the 40 models (or it would match the models from people that took the same class before them).

Your model probably compiled the last time, the server was just unable to respond. If you try compiling it again it should be instant. If anyone using this web app would compile the same model the compilations would be instant for them as well.

Not really, thank you. If there is interest I will definitely move this along faster. My current idea is to try and support most visualization that are provided by bayesplot.

Rok,
Is there a repo where I can generate issues instead of making a mess here on discourse? Again thanks.

Breck

Yes, I setup a repo on github: https://github.com/rok-cesnovar/cloudstan-experiment

Thanks.

Hi–I tried Cloudstan and I have some issues / feature requests:

  1. Could there be an interface to R? That would be convenient. As it is, I needed to first work in R on my desktop, then convert the data to JSON format, then copy-and-paste into the Data input window, and then it didn’t work! (see info below)

  2. It seems to just run 1 chain, which will create problems for practical use. Can it be set to run 4 chains and then do convergence monitoring?

If this could all work, it would be huge, I think!


Stan program:

data {
int N;
vector[N] y;
vector[N] x;
}
parameters {
real a;
real b;
real<lower=0> sigma;
}
model {
y ~ normal(a + b*x, sigma);
}

JSON file:

{“x”:[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100],“y”:[9.1269,-47.4086,57.868,-40.9684,20.4094,3.9589,-36.5922,39.8938,-43.7822,10.5357,39.8121,55.5456,43.1374,51.8758,50.6403,120.7135,47.2974,144.8674,144.6857,69.5459,39.6452,103.3713,104.9762,46.3694,23.1451,101.9823,79.3229,102.4158,12.6294,112.5222,131.2641,82.7752,99.5206,49.2659,195.2491,158.1527,124.5499,147.5908,52.3644,213.7,38.4534,89.967,185.4385,146.365,157.424,171.4865,125.7648,145.9285,214.7259,164.4298,99.7023,61.9154,132.5432,174.5351,193.5842,191.5415,248.8224,124.6395,124.9634,155.6249,207.237,172.8277,149.2687,182.4149,181.9576,182.6331,196.8433,205.2649,292.506,252.3278,236.9424,202.5592,284.0804,248.8951,221.6069,139.9184,256.2312,309.3456,284.7533,198.8171,305.7594,200.5142,215.9679,217.356,180.8665,203.5907,246.2932,275.3513,221.6917,288.6411,271.8738,288.1322,208.0077,395.8834,223.6855,330.6136,339.3635,275.6744,256.7296,361.3982],“N”:[100]}

Error message:

Sampling started…ERROR:
Error calling services function: Exception: mismatch in number dimensions declared and found in context; processing stage=data initialization; variable name=N; dims declared=(); dims found=(1) (in 'unknown file name' at line 2)
undefined

Yes. I am working on optionally supporting using Cmdstan as a backend, which would mean you could use all inputs that are supported by Cmdstan.

Same as above, by using Cmdstan that should be easily achievable. I am not sure if httpstan, that I use for the Stan backend right now, supports multiple chains.

The issue is that N should not be defined as an array, so just using writing "N": 100 instead of "N": [100] should work.

Thanks! Mitzi is developing CmdStanR (or maybe I’m getting the name wrong; it’s the R interface to CmdStan) so maybe that will work here.

Regarding the JSON problem, I used the toJSON() function to convert my R list to that JSON file. I wonder if Stan’s method of reading JSON files needs to be fixed in some way to account for this problem.

P.S. Having an interface to R would help because then we could also make graphs etc.

If when using the toJSON() or write_json() functions from the R package jsonlite to write a list to a JSON file, one should pass auto_unbox = TRUE to those functions. That way, component N from that list will be written to the JSON file as "N": 100 instead of "N": [100].

1 Like

Is this using the same JSON data schema as CmdStanPy? I understand there are issues with parsing given differences between R and Python standard package output.

Everyone else continued to respond here, so I’m piling on. It may be premature to start filing fine-grained issues. GitHub’s a terrible place for discussion.

I think having an R interface (or Python interface or shell interface or anything else lower level) makes this fundamentally different. It then becomes a tool for people who know those languages.

Is there a reason RStudio server doesn’t do what you want, @andrewgelman? If so, we might want to address that rather than trying to build two interfaces that do the same thing (provide R access to Stan).

@rok_cesnovar — who’s the imagined user for the thing you’re building?

@andrewgelman and @breckbaldwin — same question.

Yikes. That’s pretty restrictive if it doesn’t. CmdStanPy lets you do this, but I don’t know how easy that or CmdStan would be to use in a web interface. It’s all tricky with synching up returns from asynchronous processes.

It does support multiple chains, see parallel test. This will use 1 process, multiple threads.

CmdStanPy / CmdStan do not support multiple chains (in same process). Or they do, just in different processes.

Bob:

I’d like at least a rudimentary R interface because otherwise I don’t know how to call Stan or pass in data. Cutting and pasting a JSON file is kind of a mess. I guess if there is some clean way of getting the data in, it could work.

Regarding the RStudio-on-cloud thing: That was great, but one thing I don’t like about it is that it is a dependence on RStudio: What they give, they can take away. And indeed I don’t think it really works anymore, at least I recall seeing that on one of the discussions.

If we want cloudstan to not use R or Python, that makes sense; then we just need a way for R or Python users to be able to send data to cloudstan and to take the output from cloudstan and send it back to R or Python.

The starting point is to target novice users and workshops attendees. If it proves to be useful there, we can build on that.

I was only in charge of building the toy example prototype, since then Erik has taken over and completely revised the frontend and got some frontend people to work on it. The plan is to have a more polished version in late september. I am mostly just thinking/researching on how to solve the cloud orchestration/Stan backend stuff in the meantime.

Great. Thank you for the link. I couldnt find docs or an example on how to specify this in a http request data so I did not use it for the time being. I obviously wasnt thorough enouh.

And indeed I don’t think it really works anymore, at least I recall seeing that on one of the discussions.

I don’t know the current status, but long term the problem’s going to be paying for cycles.

If you want to pay for the RStudio thing, it hasn’t gone anywhere.

What they give, they can take away.

Not if we host it ourselves.

I’m still very confused about what you want. If not RStudio, then just an R terminal on the web with graphics? I don’t think we want to be in the business of building our own RStudio.

Ben Goodrich would know the current status, but my impression was that the Rstudio cloud thing wasn’t currently working, so we can’t really use it for demonstrations. This new thing is working, so that’s good, but I’d like to see some easier way to load in data than Json.

It still works for me:

https://rstudio.cloud/project/56157

1 Like