Using a lot of functions in Stan program

jtimonen · August 1, 2020, 8:51am

If I have a Stan program which takes large data array x which is given as input to a function defined in the functions block, which uses it as an input to another function and so on several levels deep, will this be any slower than writing all operations on x on the top level of the program? I guess I am interested in knowing whether the corresponding C++ program will pass x by reference or by value?

rok_cesnovar · August 1, 2020, 9:40am

Great question. Arguments in user-defined functions will get passed by constant reference, so the input part is not problematic at all.

Where it gets more “interesting” is the return value. Lets say:

This function in Stan

functions {
    real[] fun1(real[] a) {
        real b[10] = //some calculation with a
        return b;
    }
}

Is translated to (stripped of boilerplate code that does not contribute to this example):

template <typename T0__> std::vector<stan::promote_args_t<T0__>>
fun1(const std::vector<T0__>& a, std::ostream* pstream__) {
    std::vector<T0__> b;
    // some calculation with a
    return b;
}

So in theory this could create a copy on return. In practice, C++ compilers have a lot of optimizations that prevent such wasteful copies. This one in particular is called Return Value Optimization (https://shaharmike.com/cpp/rvo/).

So to answer your question: use multiple functions. Just dont turn off C++ compiler optimizations.

wds15 · August 1, 2020, 9:57am

From my understanding there is only a performance issue if you pass data into the user functions which you then assign to local variables. This will cast the data to a parameter and slow down your program.

rok_cesnovar · August 1, 2020, 10:09am

Partially true. Only if you use the function in transformed parameters.

But that does not change if you use 1 function or N nested functions.

wds15 · August 1, 2020, 11:35am

Transformed parameters and the model block are critical, of course.

jtimonen · August 1, 2020, 12:09pm

Wow, this raises more questions which are not really related to the original but I am interested: If there was vector a instead of real[] a would it also become std::vector or something else? I don’t see how the real data type or the value 10 is transferred to the C++ version. And what is the reason for the function becoming a template and not just work on the double type? Is it because it has to be used for both normal doubles and some autodiff variable types?

Does there exist some document that describes the transpiling process?

rok_cesnovar · August 1, 2020, 3:36pm

What the generated C++ type is depends on whether the variable is a parameter or data. See table below for the basic types:

Stan data type	C++ type - data	C++ type - parameter
int	int	N/A
real	double	var
vector	Eigen::Matrix<double, -1, 1>	Eigen::Matrix<var, -1, 1>
row_vector	Eigen::Matrix<double, 1, -1>	Eigen::Matrix<var, 1, -1>
matrix	Eigen::Matrix<double, -1, -1>	Eigen::Matrix<var, -1, -1>
T	std::vector< C++ type of T >	std::vector< C++ type of T >

var is the autodiff variable, which a special class that basically allows storing values as well as adjoints. Stan Math uses operator overloading for AD and the var class enables the reverse mode AD in our case. We wont to avoid using var’s if not necessary, because operations on vars are going to be slower and consume more memory. But obviously if its a parameter of our model, there is no way around using it.

Eigen::Matrix is a class from the Eigen C++ linear algebra library. The template parameters for the Eigen matrix: <double, -1, 1> mean that the elements in the matrix are doubles, the -1 denotes that the number of rows are dynamic and the 1 means that there is one column. So a Stan vector is a column vector of an Eigen Matrix, while a row_vector is a row vector of an Eigen Matrix.

For arrays the lookup is recursive. real [,,] is translated to std::vector<std::vector<double>> or std::vector<std::vector<var>>. An array of matrices is std::vector<Eigen::Matrix<double, -1, -1>> and so on.

The function is templated because we want to be able to use the function with doubles as well as vars.

Let me give a more detailed example.

Stan function:

real[] test(real[] a, real c) {
        real b[10];
        for(i in 1:10){
            b[i] = a[i]+5.0;
        }
        return b;
}

c here is an unnecessary arg here, but its here for a purpose we get at the end.

This function translates to (this time you get all the boiler plate. The comments denoted with // RC:
are mine.

template <typename T0__, typename T1__>
std::vector<stan::promote_args_t<T0__,T1__>>
test(const std::vector<T0__>& a, const T1__& c, std::ostream* pstream__) {
  using local_scalar_t__ = stan::promote_args_t<T0__, T1__>;
  const static bool propto__ = true; // RC: ignore this, this is used if UDF is a distribution
  (void) propto__;
  local_scalar_t__ DUMMY_VAR__(std::numeric_limits<double>::quiet_NaN());
  (void) DUMMY_VAR__;  // suppress unused var warning
  
  try {
    std::vector<local_scalar_t__> b;
    b = std::vector<local_scalar_t__>(10, DUMMY_VAR__); // RC: we fill B with 10 dummy variables
    
    current_statement__ = 11; // RC: these values are used for indexing on errors mainly
    for (int i = 1; i <= 10; ++i) {
      current_statement__ = 9;
      assign(b, cons_list(index_uni(i), nil_index_list()),
        (a[(i - 1)] + 5.0), "assigning variable b");} // RC: this is a bit of a  verbose way of saying b[i] = a[i] + 5.
        // RC: The (i-1) is there because Stan indexes from 1, C++ from 0.
    current_statement__ = 12;
    return b;
  } catch (const std::exception& e) {
    stan::lang::rethrow_located(e, locations_array__[current_statement__]);
      // Next line prevents compiler griping about no return
      throw std::runtime_error("*** IF YOU SEE THIS, PLEASE REPORT A BUG ***"); 
  }
  
}

Now if a and c are data, that means that T0__ is double as well as T1__. Which then means that
using local_scalar_t__ = stan::promote_args_t<T0__, T1__> is also a double.

But here is the catch that @wds15 mentioned. If c in this case is a parameter, that would mean that T1__ is var and it would then also mean that using local_scalar_t__ = stan::promote_args_t<T0__, T1__> is var. Meaning that the returned array would be of type std::vector<var>. Which is slower and wasteful. In this case c is a useless argument, but we might use it for something else in the function. The Stan compiler now comes with an optimization mode which avoids wasting time and memory space in this case (see Auto-differentiation level optimization in https://github.com/rybern/optimization-docs/blob/master/optimization-docs.md). The optimizations are an experimental feature atm.

If you specifically want doubles in the arguments, you can specify the function as:

real[] test(data real[] a, data real c) {
        real b[10];
        for(i in 1:10){
            b[i] = a[i]+5.0;
        }
        return b;
    }

That would generate

std::vector<double>
test(const std::vector<double>& a, const double& c, std::ostream* pstream__) {
  using local_scalar_t__ = double;
  const static bool propto__ = true;
  (void) propto__;
  local_scalar_t__ DUMMY_VAR__(std::numeric_limits<double>::quiet_NaN());
  (void) DUMMY_VAR__;  // suppress unused var warning
  
  try {
    std::vector<local_scalar_t__> b;
    b = std::vector<local_scalar_t__>(10, DUMMY_VAR__);
    
    current_statement__ = 11;
    for (int i = 1; i <= 10; ++i) {
      current_statement__ = 9;
      assign(b, cons_list(index_uni(i), nil_index_list()),
        (a[(i - 1)] + 5.0), "assigning variable b");}
    current_statement__ = 12;
    return b;
  } catch (const std::exception& e) {
    stan::lang::rethrow_located(e, locations_array__[current_statement__]);
      // Next line prevents compiler griping about no return
      throw std::runtime_error("*** IF YOU SEE THIS, PLEASE REPORT A BUG ***"); 
  }
  
}

rok_cesnovar · August 1, 2020, 3:40pm

I dont think there is any such doc. The Stan Math paper explains the AD stuff: https://arxiv.org/pdf/1509.07164.pdf
The Stan paper does not go into detail of the code generation.

I am writing some of this in my thesis, because I need to give some background to then explain our OpenCL specific optimizations and additions (thats why I had the table above ready :) ). If you are interested I can share it once I finish this section.

jtimonen · August 1, 2020, 5:11pm

Yea sure, pm me here in discourse if you remember. Thanks for the info!

Topic		Replies	Views
Is there a performance hit when using user-defined functions? General	2	618	March 12, 2018
Function as input into user defined fucntions? Modeling	2	465	November 3, 2017
User-defined function causing Stan program to crash General	7	669	May 24, 2018
User defined functions: data qualifier Developers	5	330	November 14, 2022
Using previously defined function as an argument General	5	651	July 10, 2023

Using a lot of functions in Stan program

Related topics