Call for testing of upcoming 2.21 - including faster map_rect and new compiler

Hi everyone!

For the next 2.21 Stan release we are going to feature freeze tomorrow on October 11th midnight our codebase in order to have one week to sort out any issues. The next release will include the Intel Threading Building Blocks (TBB) which brings a few exciting features. It will enable much more threading capabilities in the future and as of now the threading based map_rect uses the TBB as its backend.

Now, the TBB requires that we link against it dynamically on all systems which turned to be relatively hard on Windows in particular. I would therefore appreciate if people interested in helping could run their favorite model using the current develop snapshot of cmdstan - ideally models which use map_rect with threading. So what I am looking for are any problems with the installation/documentation/running the models. There are some extra things to watch out for on Windows and I intentionally point anyone willing to try to refer to the documentation which I attach below.

To get the cmdstan develop version up and running you can follow these commands:

git clone https://github.com/stan-dev/cmdstan
cd cmdstan
git submodule update --init --recursive
echo CXXFLAGS+=-DSTAN_THREADS > make/local

on MacOS and Linux you now do a make -j4 build while on Windows you do mingw32-make -j4 build. There is an additional step on Windows, but the command will tell you. Once there just try out your favorite model.

The benefits of the new TBB backend for map_rect are

  • automatic scheduling of jobs => the sharding choice becomes less relevant since the work is chunked according to CPU load automatically
  • nested map_rect calls will respect the STAN_NUM_THREADS and not use more threads than promised
  • ~50% speedup on Linux and MacOS when using map_rect threading
  • ~20% speedup on Windows when using map_rect threading
  • ~25% speedup on MacOS for single-core performance (because on MacOS a better memory allocation is used)

The details of the performance evaluation can be found on the merged PR and the wiki page on threading. These speedups will, of course, vary depending on your model/OS/…

I would appreciate feedback on the documentation / installation ease / problems running it… and if you just like it, then that’s great to know as well. Should you run into trouble it would be good to know your OS, compiler used, stan model, stan data & anything else you find relevant. Thanks!

I would also like to thank @rok_cesnovar for reviewing all of the PRs which were plentiful and we had a few extra rounds of fun once we hit the odd Windows Jenkins machine.

Sebastian

cmdstan userguide with instructions for installation: cmdstan-guide.pdf (544.5 KB)

1 Like

This might be off topic, but I’m trying to get a map_rect model working cause I thought the gains looked cool and I wanted to try it out. My attempt is here: Getting map_rect error at C++ compile (Stan to C++ compile seems to work)

Is that error I’m seeing common? Have I messed something up?

Is it possible that this is related to https://github.com/stan-dev/stanc3/pull/333

Maybe download the latest stanc or just delete the bin/stanc and make build again.

Ooo ooo excellent thank you.

I did a clean build and the model is running now (gotta debug it still, but it’s moving).

I have a model that is segfaulting: state4.data.R (40 Bytes) state4.stan (530 Bytes)

You can get this model to run by replacing the 2nd argument to map_rect, “tmp”, with “param” (which is defined as a vector parameter)

Lemme add @rok_cesnovar, @wds15 so you get notifications

(Edit: my code is running – just thought this might be important for the release)

and this works with 2.20?

Good Q. Didn’t try. Hang tight.

Yup it runs in an older Stan without error.

I just ran it with 2.20 and current develop. It works in both cases for me… how does your make/local look like?

To be sure…the model which you claim has issues is this variant:

functions {
  vector binomialf(vector phi, vector theta, data real[] x_r, data int[] x_i) {
    vector[1] lpmf;
    lpmf[1] = 0.0;
    return lpmf;
  }
}
data {
  int<lower=1> N;
  int<lower=1> K;
  int F;
  int prior_only;
}
transformed data {
  int x_i[F, 2 * N / F];
  real x_r[F, 0];
}
parameters {
  vector[1] param;
}
model {
  vector[N / F] mu[F];
  real base = 0.0;
  param ~ normal(0, 1);
  if (!prior_only) {
    vector[2] tmp;
    vector[F] tmp2 = map_rect(binomialf, param, mu, x_r, x_i);
    target += sum(tmp2);
  }
}

My bad, that model works.

Use:

vector[F] tmp2 = map_rect(binomialf, tmp, mu, x_r, x_i);

To get errors

argh…too tired…I did the fix you suggested… wait…

I didn’t enable -DSTAN_THREADS in the old version. Rebuilding now to double check if it still works in the old version with that flag.

It seems that there is an issue with stanc3. What does run ok for me is a .hpp generated with the stan2 compiler from 2.20 and then I build that with develop cmdstan…this gives me a working binary. Could you check that?

Sure gimme a second.

Yup the old .hpp compiles with the new cmdstan fine and runs.

Could you please file that for stanc3 as an issue please. Also… please include in that issue that with the new compiler there is no call to

STAN_REGISTER_MAP_RECT(1, state4_model_namespace::binomialf_functor__)

as is required by map_rect in order for MPI to work (this is unrelated to this problem, but this has to be created at the end of the file just as in the old compiler). Thanks.

Made em’, segfault: https://github.com/stan-dev/stanc3/issues/339, and MPI: https://github.com/stan-dev/stanc3/issues/340

3 Likes

Thank you all! I’d like to just add on that 2.21 will also have a completely new compiler and it could definitely use a lot of testing. I’m working overtime this week to try to iron out all of these bugs as they come up, so keep 'em coming! Thanks everyone :)

1 Like

Sorry for being so late with testing stanc3…you did ask for it earlier… I do recall… to my defense I did do some testing, but should have done more like always.

1 Like