Docker & Stan ?!

I whittled down the use-case list to four broad categories.

But I personally don’t think that use-cases map well to distinct containers given the bread of software development in this community. Hopefully the questions newly included stimulate some good discussion.

I have a few base images using rocker that also has a bunch of stan packages that we could use - I include a few more packages in that, but I am happy to help out with the various use cases.

It is R based, and is built from rocker/verse (thoughwe could probably whittle that down a bit if we needed).

I have been using some containers based on Jupyter notebook images, including one with R. The jupyterhub interactive session is a strong selling point. https://jupyter-docker-stacks.readthedocs.io/en/latest/index.html

Can PyStan(2|3) point to an existing installation of Stan or must it use the one that comes along when you run pip install pystan? From the install page, it doesn’t seem possible.

From the posts about Rstan, my guess is that it cannot leverage a separate Stan install either.

PyStan 2 has Stan in its ‘basedir’.

PyStan3 uses httpstan and it also uses Stan in its basedir.

To use other Stan lib one would need to edit multiple locations (mainly model.py).

To update Stan it is easier to update git reference (update few lines) and reinstall (loop: fix broken parts).

Bummer. Was hoping that we could have a base Stan install image for an OS and then build the various interfaces on top of that. Sigh.

OK.

That’s a bit annoying.

Additional sections & comments on the Wiki are appreciated as I do not wish to map this out based on only the thoughts in my own head.

I see that there are stanc3 images in that DockerHub registry. What would be the baseline Stan installation that we could in principle build on top of?

We can use those stanc3-images if they turn out to be fit for purpose… I don’t quite know what they were build for. In case these have been build to allow to actually compile stanc3 (which is a hell as I understood), then I am not sure if these are the right ones, since the primary objective is not stanc3? I mean, I would like the image to be useful for stan development such that the ability to stanc3 would be nice, but given the huge burden of enabling that, I am not sure.

The Jupyter stuff you mention does really sound nice. I need to look at that for a moment.

We really need someone driving this… so if you, @mtwest, want to do that, it would be great. I am more than happy to bounce you feedback - which I will do on the wiki page in a bit.

My proposal in diagram form.
stan_docker_proposal.pdf (59.7 KB)

  • Build on top of the Jupyter Docker Stack
  • Images for RStan, PyStan and CmdStan
  • Building CmdStanR and CmdStanPy on top of CmdStan
  • Try to harmonize the R and Python packages between the relevant images with version controlled build scripts
  • Knowledge of doing this with Ubuntu & Jupyter can be ported to other OS’s if there is sufficient interest.

A bunch of the technical details of which compiler flags to use and the like need to be hashed out by people more knowledgeable. What are the recommended defaults for configuring Stan in a new environment? I will make sure to bring this up in the next Thursday meeting.

2 Likes

I would also add PyStan3 (pystan-next in github).

I wasn’t sure what the status was for PyStan development was nor its downstream packages. It’s also unclear to me what the status is for other interfaces both in the stan repo and elsewhere: Julia, Matlab, Mathematica, Stata, HTTP, …

I feel its important to focus on production interfaces, at least at the start.

For in-development projects, might be worth exploring rep2docker package from the Jupyter folks.

Hi!

Thanks for you progress and energy so far. I must confess that I am not that familiar with Jupyter, but looking at it, this appears to be a very attractive option to take up. For R we would miss out on Rstudio web - which is a big minus - but we can ignore that for now; though that should be revisited eventually for sure. Rstudio is a must have for many R users.

A clarifying question: Is the minimal Jupyter image basically a web-based command line? Would that suffice to do Stan development debugging things done?

The shared volumes as shown here:
https://www.rocker-project.org/use/shared_volumes/

seem to correspond to what we want. That is, users will want to modify their local files with these containers instead of having things disappear once the container live comes to an end.

As I understand, Jupyter is super flexible in terms of what kernel it runs, so that is great. I hope/assume that the notebook state is also somehow persistent?

Now, I recall that there was a question about pre-compiled images… of course, this is what we want! I would like to pull my favourite Stan flavour as a Jupyter image an start right away.

Wrt. to Windows: I know there is Docker for Windows running vanilla Windows in the container - but do we need that? I thought we can just have Docker run Linux on the Windows platform inside the container. I would not bother with Windows unless there is a good reason.

For the R package which you list in the pdf there can be potentially a minimal an a large “tidy verse” edition, I think (see Rocker).

The C++ libraries in Stan are those under Stan-math - so it’s CVODES, Intel TBB, some boost stuff is compiled, maybe some others, just nee to look these up. We should pre-build these libraries for the CmdStan based images.

Could we get a simple version of this up and running quickly - so that we can try it out without too much effort?

Sebastian

We have some discussion here

One idea is that we have minimal docker (only needed parts) which then can be inherited.

But for user point of view, one image with jupyter is probably enough.

Just reading this page https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html

If we go datascience-notebook we could install all (RStan, CmdStanR, PyStan, CmdStanPy) in the same docker image. Then users could just select which interface / language to use.

Using JupyterLab interface also gives a lot of other tools (commandline for example) which might be something users would want to use.

I would really prefer to have lightweight docker images for fast download times… and no user on earth will switch the stan interface.

But, lightweight is important.

@ahartikainen & @wds15, I will be putting this on the Thursday general Stan meeting agenda with the intent of getting a wider circle of feedback. Hopefully you and anyone else can attend.

Regarding IDE’s

  • I will reach out to the guys at the rocker-project to get their perspective on RStudio
  • VScode and PyCharm are able to use docker containers

I am not sure where if much has happened with this over the last few weeks, but I use Stan and RStudio with mapped volumes etc as part of my standard workflow these days and it works very well. I have a few repos online where I have both the Dockerfile and Makefile to help use the container for the code - see for example my current project at

and I am happy to help out with this as much you might need.

I mainly use R though (with images based on rocker/verse)

2 Likes

I completely missed this thread! @stevebronder, thanks for tagging me. I’ll have to tag @jackinovik about docker.

@mtwest, did anything come of this meeting or from communications with the Rocker project?