If I have a Stan program which takes large data array x
which is given as input to a function defined in the functions
block, which uses it as an input to another function and so on several levels deep, will this be any slower than writing all operations on x
on the top level of the program? I guess I am interested in knowing whether the corresponding C++ program will pass x
by reference or by value?
Great question. Arguments in user-defined functions will get passed by constant reference, so the input part is not problematic at all.
Where it gets more “interesting” is the return value. Lets say:
This function in Stan
functions {
real[] fun1(real[] a) {
real b[10] = //some calculation with a
return b;
}
}
Is translated to (stripped of boilerplate code that does not contribute to this example):
template <typename T0__> std::vector<stan::promote_args_t<T0__>>
fun1(const std::vector<T0__>& a, std::ostream* pstream__) {
std::vector<T0__> b;
// some calculation with a
return b;
}
So in theory this could create a copy on return. In practice, C++ compilers have a lot of optimizations that prevent such wasteful copies. This one in particular is called Return Value Optimization (https://shaharmike.com/cpp/rvo/).
So to answer your question: use multiple functions. Just dont turn off C++ compiler optimizations.
From my understanding there is only a performance issue if you pass data into the user functions which you then assign to local variables. This will cast the data to a parameter and slow down your program.
Partially true. Only if you use the function in transformed parameters.
But that does not change if you use 1 function or N nested functions.
Transformed parameters and the model block are critical, of course.
Wow, this raises more questions which are not really related to the original but I am interested: If there was vector a
instead of real[] a
would it also become std::vector
or something else? I don’t see how the real
data type or the value 10 is transferred to the C++ version. And what is the reason for the function becoming a template and not just work on the double type? Is it because it has to be used for both normal doubles and some autodiff variable types?
Does there exist some document that describes the transpiling process?
What the generated C++ type is depends on whether the variable is a parameter or data. See table below for the basic types:
Stan data type | C++ type - data | C++ type - parameter |
---|---|---|
int | int | N/A |
real | double | var |
vector | Eigen::Matrix<double, -1, 1> | Eigen::Matrix<var, -1, 1> |
row_vector | Eigen::Matrix<double, 1, -1> | Eigen::Matrix<var, 1, -1> |
matrix | Eigen::Matrix<double, -1, -1> | Eigen::Matrix<var, -1, -1> |
T | std::vector< C++ type of T > | std::vector< C++ type of T > |
var
is the autodiff variable, which a special class that basically allows storing values as well as adjoints. Stan Math uses operator overloading for AD and the var
class enables the reverse mode AD in our case. We wont to avoid using var
’s if not necessary, because operations on vars are going to be slower and consume more memory. But obviously if its a parameter of our model, there is no way around using it.
Eigen::Matrix
is a class from the Eigen C++ linear algebra library. The template parameters for the Eigen matrix: <double, -1, 1>
mean that the elements in the matrix are doubles, the -1 denotes that the number of rows are dynamic and the 1 means that there is one column. So a Stan vector is a column vector of an Eigen Matrix, while a row_vector is a row vector of an Eigen Matrix.
For arrays the lookup is recursive. real [,,]
is translated to std::vector<std::vector<double>>
or std::vector<std::vector<var>>
. An array of matrices is std::vector<Eigen::Matrix<double, -1, -1>>
and so on.
The function is templated because we want to be able to use the function with doubles as well as vars.
Let me give a more detailed example.
Stan function:
real[] test(real[] a, real c) {
real b[10];
for(i in 1:10){
b[i] = a[i]+5.0;
}
return b;
}
c
here is an unnecessary arg here, but its here for a purpose we get at the end.
This function translates to (this time you get all the boiler plate. The comments denoted with // RC:
are mine.
template <typename T0__, typename T1__>
std::vector<stan::promote_args_t<T0__,T1__>>
test(const std::vector<T0__>& a, const T1__& c, std::ostream* pstream__) {
using local_scalar_t__ = stan::promote_args_t<T0__, T1__>;
const static bool propto__ = true; // RC: ignore this, this is used if UDF is a distribution
(void) propto__;
local_scalar_t__ DUMMY_VAR__(std::numeric_limits<double>::quiet_NaN());
(void) DUMMY_VAR__; // suppress unused var warning
try {
std::vector<local_scalar_t__> b;
b = std::vector<local_scalar_t__>(10, DUMMY_VAR__); // RC: we fill B with 10 dummy variables
current_statement__ = 11; // RC: these values are used for indexing on errors mainly
for (int i = 1; i <= 10; ++i) {
current_statement__ = 9;
assign(b, cons_list(index_uni(i), nil_index_list()),
(a[(i - 1)] + 5.0), "assigning variable b");} // RC: this is a bit of a verbose way of saying b[i] = a[i] + 5.
// RC: The (i-1) is there because Stan indexes from 1, C++ from 0.
current_statement__ = 12;
return b;
} catch (const std::exception& e) {
stan::lang::rethrow_located(e, locations_array__[current_statement__]);
// Next line prevents compiler griping about no return
throw std::runtime_error("*** IF YOU SEE THIS, PLEASE REPORT A BUG ***");
}
}
Now if a
and c
are data, that means that T0__
is double as well as T1__. Which then means that
using local_scalar_t__ = stan::promote_args_t<T0__, T1__>
is also a double.
But here is the catch that @wds15 mentioned. If c
in this case is a parameter, that would mean that T1__ is var and it would then also mean that using local_scalar_t__ = stan::promote_args_t<T0__, T1__>
is var. Meaning that the returned array would be of type std::vector<var>
. Which is slower and wasteful. In this case c is a useless argument, but we might use it for something else in the function. The Stan compiler now comes with an optimization mode which avoids wasting time and memory space in this case (see Auto-differentiation level optimization in https://github.com/rybern/optimization-docs/blob/master/optimization-docs.md). The optimizations are an experimental feature atm.
If you specifically want doubles in the arguments, you can specify the function as:
real[] test(data real[] a, data real c) {
real b[10];
for(i in 1:10){
b[i] = a[i]+5.0;
}
return b;
}
That would generate
std::vector<double>
test(const std::vector<double>& a, const double& c, std::ostream* pstream__) {
using local_scalar_t__ = double;
const static bool propto__ = true;
(void) propto__;
local_scalar_t__ DUMMY_VAR__(std::numeric_limits<double>::quiet_NaN());
(void) DUMMY_VAR__; // suppress unused var warning
try {
std::vector<local_scalar_t__> b;
b = std::vector<local_scalar_t__>(10, DUMMY_VAR__);
current_statement__ = 11;
for (int i = 1; i <= 10; ++i) {
current_statement__ = 9;
assign(b, cons_list(index_uni(i), nil_index_list()),
(a[(i - 1)] + 5.0), "assigning variable b");}
current_statement__ = 12;
return b;
} catch (const std::exception& e) {
stan::lang::rethrow_located(e, locations_array__[current_statement__]);
// Next line prevents compiler griping about no return
throw std::runtime_error("*** IF YOU SEE THIS, PLEASE REPORT A BUG ***");
}
}
I dont think there is any such doc. The Stan Math paper explains the AD stuff: https://arxiv.org/pdf/1509.07164.pdf
The Stan paper does not go into detail of the code generation.
I am writing some of this in my thesis, because I need to give some background to then explain our OpenCL specific optimizations and additions (thats why I had the table above ready :) ). If you are interested I can share it once I finish this section.
Yea sure, pm me here in discourse if you remember. Thanks for the info!