Docker & Stan ?!

Hi!

I was wondering what the state of Docker is wrt to Stan. I know that there is a cmdstan Docker image on docker hub, but I wonder if there is more.

In all honesty, I really think we should have Docker images for about everything - for users and for developers.

For users we should have interface specific docker images. It takes really long to get going an image with RStan… but it should be a matter of docker pull ....

The same goes for developer images for math/stan/cmdstan. I recently started debugging in Docker images and it sooooo much more efficient.

Now, this is some work to be done… and I think the project should strongly consider to put some $$$ here. The benefits are huge:

  • super easy installs for users
  • easy to reproduce errors => a lot easier debugging

I hope others think the same so that we can make some progress here. The install burden would be decreased a lot.

Sebastian

7 Likes

I’ve used it for research papers and for courses I give. Works great.

3 Likes

I know. It’s just amazing how easily things become reproducible.

And for courses it would be super useful to just pull some docker image from docker hub and off you go. These Dockerfiles are so universal that you get cloud machines setup with this.

The Rocker project is really amazing. Want a versioned R (like R itself and all its packages) with Rstudio - no problem. It’s a matter of minutes if you got a fast internet connection.

Adding Stan in there directly would be super convenient (we can actually just build on their work).

If you need help tell me. Here’s examples from research


and for a course

5 Likes

Thanks.

Great examples.

I would like to see that people can write Dockerfiles which start from an official Stan Dockerfile if possible.

All these quirks with compile time flags and those tricks would be put in there. Then everyone else can spend their time doing better things other than thinking about installs.

6 Likes

I agree wholeheartedly! :) Some thought should go into what packages should be included as default, i.e., rstan, rstanarm, brms, bayesplot, etc. But also, should we included Julia, Python etc, or should we have many different flavors of Docker files instead?

I don’t know. Needs to be worked out what’s best. I would like as slim images as possible per interface (still including useful utilities per interface). We should take advantage of the fact that you can derive one from the other image, etc.

1 Like

Amen!

Love this idea! Tensorflow has these on dockerhub and the dockerfiles here. I think we could make a standocker repo that holds these. What are all the flavors we would want? It would be nice to have ones with the OpenCL stuff pre-installed. One for rstan, cmdstan, etc?

2 Likes

I recently made a docker image with rstan v. 2.22, brms, rstanarm, and various other related packages. It’s built on a Ubuntu base and also has the tidyverse R packages.
The repo is crpeters/docker-stan:apt-0.1.

docker has security implications which make it pretty unusable on clusters (the docker group essentially has superuser access). Also as I found out to my own detriment, on laptops, running docker containers quickly becomes a bloated mess. Some containers could pretty easily be derived from the build system, but maintaining them (w.r.t. base images) might be complicated. Also, most package level dockerfiles are based off Alpine linux, which will need to be tested for the dependencies.

Finally, docker virtual systems might be lighter than containers, but they still operate out of a pretty limited resource pool, so performance is going to be much worse.

Another (more elegant, in my very biased opinion) solution is to create a nixpkg derivation for Nix. These are reproducible, and have no overhead in terms of performance. They do not however, run on windows well.

I recently made Docker images for RStan, rjags, and R2OpenBugs. You can check them out on our GitHub repo if you’re interested:

https://github.com/GoHypernet/Galileo-examples/archive/R_Bayesian.zip

You can also run all of them on Galileo, which was created to provide an alternative to a cluster and is free:

https://app.galileoapp.io

1 Like

My university’s HPC systems use Singularity, which is compatible with Docker images and resolves the superuser issues. Performance-wise, it doesn’t seem to be substantially different from running code natively, and installation is quite a bit easier.

2 Likes

cmdstan and rstan are both available in nix. For example, I can totally recreate my environment with

{ pkgs ? import <nixpkgs> {} }:

let

R-with-my-packages = pkgs.rWrapper.override{ packages = with pkgs.rPackages; [ ggplot2 dplyr xts rstan shinystan tidyr ggfan devtools knitr tidyverse scales pscl forcats ]; };
in

pkgs.mkShell { buildInputs = with pkgs; [ emacs (python37.withPackages(ps: with ps; [ numpy matplotlib seaborn ])) R-with-my-packages libintl libiconv ];}

and

nix-shell -I nixpkgs=https://github.com/NixOS/nixpkgs/archive/20.03.tar.gz R-shell.nix

I can even create a docker image from it using:

1 Like

I know about the admin right issues of Docker. It’s just that Docker is a defacto standard as I perceive it. Alternatives - such as Singularity - can directly parse Dockerfiles; just like other alternatives.

A package manager like nix sounds great - it’s just not a real solution if Windows is not on the list. It would still be useful to document these things on our wiki as it is surely helpful for a (admittedly large) subset of Stan users.

I really think we need a stanhub on docker hub. It will make installation, first playing with Stan or even Stan mode development a lot more streamlined. The concern about performance is not as hard to me, since this is for getting things going still super valuable. For high-performance on clusters there is either Singularity or you should anyway spend the effort to deal with the install or - even better - you have a cluster admin around who can help you.

3 Likes

Fantastic. I have been messing around with running stan but haven’t tried out the nix+docker generation before. This would make a great starting point for generating the Dockerfiles if Stan does go with a stanhub approach on Dockerhub.

This is exciting news.

I’ve always relied on @jrnold rstan image all this time; a reliable base to build on 👌🏼🙏🏼

I see that Generable currently holds the “stangroup” login on DockerHub. There are a number of images for Stan variants but many of them are pretty old. If there are images that people use regularly, it would make sense to pool those best practices and throw up some images on DockerHub so they aren’t hidden in someones personal GitHub repo.

There is even more…we even have a docker image serving thing in the cloud @serban-nicusor knows more.

We should really have an up to date docker Stan image central.

2 Likes

Hey, the repository is up for a while but not used much, I would love to bring it up to more use. I think it’s very convenient for any developer to just pull a docker image and work with it.
We need to come up with a list of all projects that can be dockerized so I can integrate that process into Jenkins CI/CD to have them automated and synchronized with GitHub state/releases. A bit of order and naming conventions should keep it easy.
I can help dockerize some of our projects but never tried nix.
I’m up for this if I can get a bit of help with the planning of projects and if there are any restrictions or conventions that need to be followed.

2 Likes