Map_rect & data for the case of MPI

Continuing the discussion from Parallel autodiff v3 for the bits and pieces about map_rect:

This statement will just work fine whenever MPI is NOT used.

We should fix that, sure; but it is not straightforward.

I don’t think that we can do anything at the parser level, because it depends on how you compile the Stan program.

Ok, so what happens? Well, with MPI on we want to be clever about the data. The goal is to distributed the data only once and then always reuse it. So whenever MPI is turned on then what will happen is:

  1. The data passed into map_rect is basically given a C++ type. Thus, the data is assumed to be immutable throughout the entire duration of program execution. That’s different compared the “data” keyword in Stan programs used to say “this is not an autodiff variable”.
  2. Only the outer-most map_rect call will use the MPI backend. Any nested map_rect call will not use the MPI backend (and as such the MPI data specialities do not apply).
  3. If you have two map_rect calls one after the other (and these are not nested inside another), then both map_rect calls will use the MPI backend.

Specifically now what will happen with this:

for (i in 1:2)
  y = map_rect(f, phi, thetas, { { 1.0 }, {2.0} }, { { i }, { i } });

If MPI is not turned on (so we run serially or with threads), then we get

  y = map_rect(f, phi, thetas, { { 1.0 }, {2.0} }, { { 1 }, { 1 } });
  y = map_rect(f, phi, thetas, { { 1.0 }, {2.0} }, { { 2 }, { 2 } });

If MPI is turned on, then we get (it does not matter if threading is on - MPI takes precedence)

  y = map_rect(f, phi, thetas, { { 1.0 }, {2.0} }, { { 1 }, { 1 } });
  // the data is not checked again after the first invocation
  y = map_rect(f, phi, thetas, { { 1.0 }, {2.0} }, { { 1 }, { 1 } });

Now, to make it even more fun, if the user writes in the Stan program

  y = map_rect(f, phi, thetas, { { 1.0 }, {2.0} }, { { 1 }, { 1 } });
  // this is now being correctly executed, since the stan parser will implicitly generate a new type of a new call => the change in data is correctly handled
  y = map_rect(f, phi, thetas, { { 1.0 }, {2.0} }, { { 2 }, { 2 } });

You see, whenever MPI is turned on, then the data is taken to be immutable. Whenever MPI is not used, then these special considerations do not hold.

Does that makes sense? Any more confusion points?

I can think about a good doc for this - should I file a PR against the doc repo or what do you prefer?

2 Likes

I understand how it works without MPI. But what happens to this program when we do have MPI turned on? Will it fail with a warning message? Give the wrong answer? If it gives the wrong answer, that’s really really bad in my opinion and fixing the bug should be a high priority.

We absolutely need to fix the doc. Stan isn’t someting that should only run for someone with insider knowledge. I don’t think MPI should’ve been merged without doc on edge conditions.

I’m truly worried that we haven’t even thought through the consequences of the MPI and multi-threading functionality we have added and are going to run off and add more.

I don’t think it’s fair to say to give the “wrong” answer. For MPI the map_rect facility was always meant to have data being immutable so that it can be cached everywhere. That probably needs better doc, sure.

Sure, I can give it a try. It’s easy to say now that this edge case should have been documented before…but at the time we never had these examples you brought up in mind. The use of map_rect was always for constant immutable data… which is not the same as things we cannot autodiff against. I always said that I only want to let data into the map_rect which comes from the data or transformed data block (or being passing through functions). That is the requirement to me.

EDIT: We actually have documented it correctly in section 23.2 Map Function, we do write correctly:

The last two arguments are two dimensional arrays of real and integer data values. These argument types are marked with the data qualifier to indicate that they must only contain variables originating in the data or transformed data blocks. This will allow such data to be pinned to a processor on which it is being processed to reduce communication overhead.

So this is very clear to me and it basically disallows the examples which you have come up with. I know that the examples are correct as per Stan language syntax, but above spells out clearly why that is not admissible.

I brought them up, which is why we went to a data-only approach.

Thanks. That’s a start. But I’m afraid this program still parses:

functions {
  vector f(vector beta, vector theta, data real[] xr, data int[] xi ) {
    return [1.0]';
  }
}
transformed data {
  vector[2] beta;
  vector[2] theta[2];
  real xr[2, 2];
  int xi[3, 2, 2];
  real s = 0;
  for (i in 1:2)
    s += sum(map_rect(f, beta, theta, xr, xi[i]));
}

and it still works if we move the map_rect itself to transformed parameters:

transformed parameters {
  real s = 0;
  for (i in 1:2)
    s += sum(map_rect(f, beta, theta, xr, xi[i]));
}

I created an issue for stanc3.