Docker & Stan ?!

There have been also people in pystan side who would like to use official docker.

Can we create some base images which contain minimal tools and another with user tools.

This base image could be used as a parent if user wants to add their own tools.

@syclik would have that login if it’s owned by Generable

I am not a docker expert, but I am happy to come up with a wish list as to what I would see helpful.

What I would like to have is fast compiles and some persistency.

That may translate to Clang compiler, ccache, some persistency considerations, easily switchable compiler configs…

Let’s start a wiki page to collect this in a structured way?!

3 Likes

To use ccache with docker, would it mean, that one needs to compile a simple model in the docker file? Also would that mean, each part of the libraries are needed to be touch, or does the compilation follow same steps for every model (I think it does this).

Yes wiki sounds great.

Where should the wiki page for this topic go? Under the main Stan repo seems to make the most sense as it will likely cover many different interfaces.

I would have put it under CmdStan, but I see your point. I start this here:

Please go ahead and add whatever you think is sensible. We need to weed this out at some point.

1 Like

I whittled down the use-case list to four broad categories.

But I personally don’t think that use-cases map well to distinct containers given the bread of software development in this community. Hopefully the questions newly included stimulate some good discussion.

I have a few base images using rocker that also has a bunch of stan packages that we could use - I include a few more packages in that, but I am happy to help out with the various use cases.

It is R based, and is built from rocker/verse (thoughwe could probably whittle that down a bit if we needed).

I have been using some containers based on Jupyter notebook images, including one with R. The jupyterhub interactive session is a strong selling point. https://jupyter-docker-stacks.readthedocs.io/en/latest/index.html

Can PyStan(2|3) point to an existing installation of Stan or must it use the one that comes along when you run pip install pystan? From the install page, it doesn’t seem possible.

From the posts about Rstan, my guess is that it cannot leverage a separate Stan install either.

PyStan 2 has Stan in its ā€˜basedir’.

PyStan3 uses httpstan and it also uses Stan in its basedir.

To use other Stan lib one would need to edit multiple locations (mainly model.py).

To update Stan it is easier to update git reference (update few lines) and reinstall (loop: fix broken parts).

Bummer. Was hoping that we could have a base Stan install image for an OS and then build the various interfaces on top of that. Sigh.

OK.

That’s a bit annoying.

Additional sections & comments on the Wiki are appreciated as I do not wish to map this out based on only the thoughts in my own head.

I see that there are stanc3 images in that DockerHub registry. What would be the baseline Stan installation that we could in principle build on top of?

We can use those stanc3-images if they turn out to be fit for purpose… I don’t quite know what they were build for. In case these have been build to allow to actually compile stanc3 (which is a hell as I understood), then I am not sure if these are the right ones, since the primary objective is not stanc3? I mean, I would like the image to be useful for stan development such that the ability to stanc3 would be nice, but given the huge burden of enabling that, I am not sure.

The Jupyter stuff you mention does really sound nice. I need to look at that for a moment.

We really need someone driving this… so if you, @mtwest, want to do that, it would be great. I am more than happy to bounce you feedback - which I will do on the wiki page in a bit.

My proposal in diagram form.
stan_docker_proposal.pdf (59.7 KB)

  • Build on top of the Jupyter Docker Stack
  • Images for RStan, PyStan and CmdStan
  • Building CmdStanR and CmdStanPy on top of CmdStan
  • Try to harmonize the R and Python packages between the relevant images with version controlled build scripts
  • Knowledge of doing this with Ubuntu & Jupyter can be ported to other OS’s if there is sufficient interest.

A bunch of the technical details of which compiler flags to use and the like need to be hashed out by people more knowledgeable. What are the recommended defaults for configuring Stan in a new environment? I will make sure to bring this up in the next Thursday meeting.

1 Like

I would also add PyStan3 (pystan-next in github).

I wasn’t sure what the status was for PyStan development was nor its downstream packages. It’s also unclear to me what the status is for other interfaces both in the stan repo and elsewhere: Julia, Matlab, Mathematica, Stata, HTTP, …

I feel its important to focus on production interfaces, at least at the start.

For in-development projects, might be worth exploring rep2docker package from the Jupyter folks.

Hi!

Thanks for you progress and energy so far. I must confess that I am not that familiar with Jupyter, but looking at it, this appears to be a very attractive option to take up. For R we would miss out on Rstudio web - which is a big minus - but we can ignore that for now; though that should be revisited eventually for sure. Rstudio is a must have for many R users.

A clarifying question: Is the minimal Jupyter image basically a web-based command line? Would that suffice to do Stan development debugging things done?

The shared volumes as shown here:
https://www.rocker-project.org/use/shared_volumes/

seem to correspond to what we want. That is, users will want to modify their local files with these containers instead of having things disappear once the container live comes to an end.

As I understand, Jupyter is super flexible in terms of what kernel it runs, so that is great. I hope/assume that the notebook state is also somehow persistent?

Now, I recall that there was a question about pre-compiled images… of course, this is what we want! I would like to pull my favourite Stan flavour as a Jupyter image an start right away.

Wrt. to Windows: I know there is Docker for Windows running vanilla Windows in the container - but do we need that? I thought we can just have Docker run Linux on the Windows platform inside the container. I would not bother with Windows unless there is a good reason.

For the R package which you list in the pdf there can be potentially a minimal an a large ā€œtidy verseā€ edition, I think (see Rocker).

The C++ libraries in Stan are those under Stan-math - so it’s CVODES, Intel TBB, some boost stuff is compiled, maybe some others, just nee to look these up. We should pre-build these libraries for the CmdStan based images.

Could we get a simple version of this up and running quickly - so that we can try it out without too much effort?

Sebastian

We have some discussion here

One idea is that we have minimal docker (only needed parts) which then can be inherited.

But for user point of view, one image with jupyter is probably enough.

Just reading this page https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html

If we go datascience-notebook we could install all (RStan, CmdStanR, PyStan, CmdStanPy) in the same docker image. Then users could just select which interface / language to use.

Using JupyterLab interface also gives a lot of other tools (commandline for example) which might be something users would want to use.