Stanc3 optimisations... what is available?

Hi!

Maybe @rybern or @rok_cesnovar know this… what is the state of the optimisations in stanc3? I mean, what things are being optimised at the moment, do we have some documentation about this somewhere?

I am asking as I was wondering if the over-propagation in user-functions is already being taken care of which would make the life of @paul.buerkner a lot easier to make reduce_sum work in brms.

So in essence, declaring things like

real foo(matrix X, vector paramv) {
   matrix[2,2] Xs = X[1:2];
   return sum( Xs * paramv);
}

will usually propagate Xs to be a var, which can be very costly… can the optimisations already catch this and keep Xs as data whenever X is only data? Or would I have to declare X as a data matrix to make this work?

Thanks!

Cheers,
Sebastian

See discussion here: https://github.com/paul-buerkner/brms/issues/892#issuecomment-673331661

This is the docs that will eventually make it in the docs repo on optimizatons: https://github.com/rybern/optimization-docs/blob/master/optimization-docs.md

This doc mentions optimization levels, but it will currently run all optimizations if you specify the --O flag.

O1 is what will help here. And that definitely works as it should and has been tested the most (I played around with it quite a lot).

Thanks… this raises a question - is this legal:

real foo(data matrix X, vector paramv) {
   data matrix[2,2] Xs = X[1:2];
   return sum( Xs * paramv);
}

and makes Xs a data thing inside the function foo?

This example (without the data at declaration which is ilegal)

functions {
real foo(data matrix X, vector paramv) {
   matrix[2,2] Xs = X[1:2];
   return sum( Xs * paramv);
}
}

without optimization:

template <typename T1__>
stan::promote_args_t<T1__>
foo(const Eigen::Matrix<double, -1, -1>& X,
    const Eigen::Matrix<T1__, -1, 1>& paramv, std::ostream* pstream__) {
  using local_scalar_t__ = stan::promote_args_t<T1__>;
  const static bool propto__ = true;
  (void) propto__;
  local_scalar_t__ DUMMY_VAR__(std::numeric_limits<double>::quiet_NaN());
  (void) DUMMY_VAR__;  // suppress unused var warning
  
  try {
    Eigen::Matrix<local_scalar_t__, -1, -1> Xs;
    Xs = Eigen::Matrix<local_scalar_t__, -1, -1>(2, 2);
    stan::math::fill(Xs, DUMMY_VAR__);
    
    current_statement__ = 1;
    assign(Xs, nil_index_list(),
      rvalue(X, cons_list(index_min_max(1, 2), nil_index_list()), "X"),
      "assigning variable Xs");
    current_statement__ = 2;
    return sum(multiply(Xs, paramv));
  } catch (const std::exception& e) {
    stan::lang::rethrow_located(e, locations_array__[current_statement__]);
      // Next line prevents compiler griping about no return
      throw std::runtime_error("*** IF YOU SEE THIS, PLEASE REPORT A BUG ***"); 
  }
  
}
template <typename T1__>
stan::promote_args_t<T1__>
foo(const Eigen::Matrix<double, -1, -1>& X,
    const Eigen::Matrix<T1__, -1, 1>& paramv, std::ostream* pstream__) {
  using local_scalar_t__ = stan::promote_args_t<T1__>;
  const static bool propto__ = true;
  (void) propto__;
  local_scalar_t__ DUMMY_VAR__(std::numeric_limits<double>::quiet_NaN());
  (void) DUMMY_VAR__;  // suppress unused var warning
  
  try {
    Eigen::Matrix<double, -1, -1> lcm_sym2__;
    double lcm_sym1__;
    {
      Eigen::Matrix<double, -1, -1> Xs;
      Xs = Eigen::Matrix<double, -1, -1>(2, 2);
      stan::math::fill(Xs, std::numeric_limits<double>::quiet_NaN());
      
      assign(lcm_sym2__, nil_index_list(),
        rvalue(X, cons_list(index_min_max(1, 2), nil_index_list()), "X"),
        "assigning variable lcm_sym2__");
      current_statement__ = 2;
      return sum(multiply(lcm_sym2__, paramv));
    }
  } catch (const std::exception& e) {
    stan::lang::rethrow_located(e, locations_array__[current_statement__]);
      // Next line prevents compiler griping about no return
      throw std::runtime_error("*** IF YOU SEE THIS, PLEASE REPORT A BUG ***"); 
  }
  
}

So yeah, optimization for this case work really nice.

Without optimization the Xs is defined as
Eigen::Matrix<local_scalar_t__, -1, -1> Xs;
with optimization its
Eigen::Matrix<double, -1, -1> Xs;

If you want to see the diff: https://www.diffchecker.com/6utjyyLB

1 Like

Just want to point out, so that I dont give the wrong impression that this is my work, that this is all the work of @rybern and other stanc3 devs. I merely reviewed the last PR.

Can we move the doc you linked to the stanc3 wiki and maybe even link it in in the README.md? This info seems really hidden. If its not hidden on purpose we should get this out… this optimisation can make a HUGE difference.

It is not purposely hidden. This was intended to be a PR in docs and with that its own section in the online manual. The intention was to merge the docs for the release of 2.24, but Ryan has been really busy this last few weeks.

A temporary link on stanc3 wiki is a good idea for a temporary solution.

With a note that the levels are not implemented in 2.24 (a PR is open though).

Sorry @wds15, merging the optimization docs has been on my to-do list for a while, I’ll see if I can finish that in the next couple days. I agree with everything @rok_cesnovar said, thanks Rok!

I think it should be. But why use data when the context it’s used in doesn’t require it? I’d just write the body of this as a one-liner:

  return sum(X[1:2] * paramv);

or to get rid of the need for sum,

  return X[1] * paramv + X[2] * paramv;

Pulling out rows of matrices is not efficient, though. It doesn’t have memory locality and it requires a new allocation and copy. If we can get template expressions rolling, either of these should be efficient (up to not having memory locality) because neither X[1:2] or X[1] should require a new allocation or copy. It’d be better if X was transposed and we took two columns.