Stanc3 parallel reduce_sum

rok_cesnovar · January 30, 2020, 7:24pm

I am opening another thread as the parallel autodiff v3 one seems long enough already. This one is oriented towards trying to add stanc3 support for reduce_sum. I am far from a stanc3 guru, but love to learn new stuff, especially with a real task like this.

I took the model from https://github.com/stan-dev/math/wiki/Parallel-reduce_sum-project-page and trimmed it down to a basic example https://gist.github.com/rok-cesnovar/832f9752fc2a8ba7ae7b3da3eb5f2743 (EDIT: see revision 1)

So the first request would be to someone to double check that I trimmed it correctly. Is this how this was intended for use?

For now I managed to get stanc3 to output the following hpp file https://gist.github.com/rok-cesnovar/3fd4f2326a667cd9e31c7e14a6ee32f5 (EDIT: see revision 1)

I havent setup cmdstan with the reduce_sum branch yet to try if it compiles and all (will do that tomorrow).

If someone skimms through the hpp to let me know if this is close to what was expected or if I completely missed the point.

Thanks.

cc @wds15, @bbbales2, @stevebronder

bbbales2 · January 30, 2020, 7:34pm

Right now the arguments go:

shared_array, grainsize, stream, ...

Like:

template <typename ReduceFunction, typename Vec, require_vector_like_t<Vec>...,
          typename... Args>
auto reduce_sum(const Vec& vmapped, std::size_t grainsize, std::ostream* msgs,
                const Args&... args)

The code you generated looks fine, just the arguments look like:

grainsize, y, ..., stream

I don’t mind if grainsize and y get swapped around, but stream needs to go before the variadic bit for the C++ type deduction to work properly.

bbbales2 · January 30, 2020, 7:34pm

Also this looks awesome I am so excited to try this thanks!

rok_cesnovar · January 30, 2020, 7:38pm

Thanks. Good call on the stream and variadic stuff.
I am almost sure this is missing one of the functors from https://discourse.mc-stan.org/uploads/short-url/5hKzxFfjOZG9cyX3Jz09olvCPjO.hpp. My first goal here is to see what is missing in this initial generated code to get it to compile and run.

rok_cesnovar · January 30, 2020, 7:41pm

Also is the example Stan code here correct then?

that example goes grainsize, y ...

bbbales2 · January 30, 2020, 7:41pm

So this Stan code:

  real lpmf = reduce_sum(hierarchical_reduce, grainsize, y, log_lambda_group, gidx);

Compiles to:

        lpmf = reduce_sum<hierarchical_reduce_functor__>(grainsize, y,
                 log_lambda_group, gidx, pstream__);

And this:

real hierarchical_reduce(int start, int end, int[] y_slice, real[] log_lambda_group, int[] gidx) {
    return poisson_log_lpmf(y_slice| log_lambda_group[gidx[start:end]]);   
  }

Compiles to:

template <typename T3__>
typename boost::math::tools::promote_args<T3__>::type
hierarchical_reduce(const int& start, const int& end,
                    const std::vector<int>& y_slice,
                    const std::vector<T3__>& log_lambda_group,
                    const std::vector<int>& gidx, std::ostream* pstream__) {
  using local_scalar_t__ = typename boost::math::tools::promote_args<T3__>::type;
  const static bool propto__ = true;
  (void) propto__;
  
  try {
    current_statement__ = 43;
    return poisson_log_lpmf<false>(y_slice,
             rvalue(log_lambda_group,
               cons_list(
                 index_multi(rvalue(gidx,
                               cons_list(index_min_max(start, end),
                                 nil_index_list()), "gidx")),
                 nil_index_list()), "log_lambda_group"));
  } catch (const std::exception& e) {
    stan::lang::rethrow_located(e, locations_array__[current_statement__]);
      // Next line prevents compiler griping about no return
      throw std::runtime_error("*** IF YOU SEE THIS, PLEASE REPORT A BUG ***"); 
  }
  
}

struct hierarchical_reduce_functor__ {
template <typename T3__>
typename boost::math::tools::promote_args<T3__>::type
operator()(const int& start, const int& end, const std::vector<int>& y_slice,
           const std::vector<T3__>& log_lambda_group,
           const std::vector<int>& gidx, std::ostream* pstream__)  const 
{
return hierarchical_reduce(start, end, y_slice, log_lambda_group, gidx,
         pstream__);
}
};

I think that’s all you should need.

bbbales2 · January 30, 2020, 7:42pm

I think that Stan code is from earlier prototypes. I’ve only been working with the C++.

rok_cesnovar · January 30, 2020, 7:47pm

But other than the order of arguments the Stan model looks good?

bbbales2 · January 30, 2020, 7:49pm

I assume you’ll get a bajillion syntax errors cause that’s what compilers do, but if I was measuring this code in inches, I’d say yes, it’s likely all there.

rok_cesnovar · January 30, 2020, 8:00pm

Ok, take 2:

model: https://gist.github.com/rok-cesnovar/832f9752fc2a8ba7ae7b3da3eb5f2743
hpp: https://gist.github.com/rok-cesnovar/3fd4f2326a667cd9e31c7e14a6ee32f5

changes:

order of shared_array, grainsize
fixed stream positioning

bbbales2 · January 30, 2020, 8:13pm

Something like this should also compile:

functions {
  real my_func(int start, int end, real[] y_slice, real mu, real sigma) {
    return normal_lpdf(y_slice | mu, sigma);
  }
}

parameters {
  real a[5];
}

model {
  target += reduce_sum(my_func, a, 1, 0.0, 1.0));
}

And be directly comparable to:

parameters {
  real a[5];
}

model {
  a ~ normal(0, 1);
}

rok_cesnovar · January 30, 2020, 8:26pm

Thanks for the simpler model.

That produces https://gist.github.com/rok-cesnovar/6588718f07d63c0ecba25ad7f3fe900d

I will get this up on a branch tomorrow, sorry for these awful gist links.

bbbales2 · January 30, 2020, 8:29pm

Move the pstream argument inside the functor arguments as well (but don’t move in the actual call to my_func!). Change:

struct my_func_functor__ {
template <typename T2__, typename T3__, typename T4__>
typename boost::math::tools::promote_args<T2__, T3__,
T4__>::type
operator()(const int& start, const int& end,
           const std::vector<T2__>& y_slice, const T3__& mu,
           const T4__& sigma, std::ostream* pstream__)  const 
{
return my_func(start, end, y_slice, mu, sigma, pstream__);
}
};

To:

struct my_func_functor__ {
template <typename T2__, typename T3__, typename T4__>
typename boost::math::tools::promote_args<T2__, T3__,
T4__>::type
operator()(const int& start, const int& end,
           const std::vector<T2__>& y_slice, std::ostream* pstream__, const T3__& mu,
           const T4__& sigma)  const 
{
return my_func(start, end, y_slice, mu, sigma, pstream__);
}
};

bbbales2 · January 30, 2020, 8:29pm

But yeah just start compiling this tomorrow and sorting through the errors. Looks all there to me but I could be wrong.

rok_cesnovar · January 30, 2020, 8:54pm

I manually fixed the arguments in the functor like you proposed above and it compiled on first try :) might as well go to the lottery today haha

Great start to this.

rok_cesnovar · January 31, 2020, 10:08am

I have this running now for up to 3 additional arguments (+ container) of any type. Will make a WIP PR for stanc3 later in the afternoon. Will also post a linux binary so you can test this out in cmdstan.

Will probably need some assistance to make this signature applicable to any number of argument. But its a start I guess.

wds15 · January 31, 2020, 12:18pm

Did you even manage to make the 3 arguments optional or do we have to use 3 arguments?

Awesome!

Thanks so much… this will make it a lot easier to benchmark this with a few more models.

rok_cesnovar · January 31, 2020, 1:21pm

It supports 0 (not sure its a use case), 1, 2 or 3 arguments. I am not supporting all types currently (like vector[, , , , ,]) but it should cover most.

rok_cesnovar · January 31, 2020, 7:24pm

Update, WIP PR open https://github.com/stan-dev/stanc3/pull/451 with the linux stanc3 binary posted in a comment.

The normal model compiles, the cleaned up example doesnt yet and I am not sure exactly why. Any comments welcome.

Also, if anyone has some additional example models I could test this that would be fantastic.

wds15 · February 1, 2020, 11:04am

Whow! Thanks so much for this! I was able to make the poisson benchmark model use the new compiler. poisson-hierarchical-scale-v3-stanc3-A.stan (4.0 KB)

It is so cool not having to fiddle around with C++, but just compile the stan model with the magic new stanc3 compiler.

I used docker on my Mac to do the stanc3 compilation and then everything was run on my MacOS Catalina.

We should start thinking about how we merge this into the repositories. So finalize the interface, possibly split up the math PRs into two stages (tuple tools + reduce_sum), and then the stanc3 parser.

The results are very nice:

grainsize = 0 speedup

grainsize=0 walltime

grainsize >0 speedup

grainsize > 0 walltime

Topic		Replies	Views
Parallel reduce in the Stan language Developers	12	1235	April 11, 2019
Long message when compiling reduce_sum model Interfaces cmdstanr	0	236	December 14, 2023
Reduce_sum_static fails to compile with SIGSEGV error - PyStan3 Modeling	25	833	August 2, 2020
CmdStan, Stan 2.23.0, Math 3.2.0 released! General	12	1614	May 13, 2020
Parallel autodiff v3 Developers	61	2861	January 23, 2020

Stanc3 parallel reduce_sum

Related topics