User defined functions: data qualifier

Niko · November 14, 2022, 7:41am

I was wondering whether adding/omitting the data qualifier in UDFs changes the (performance of the) generated c++ code?

Declaring an argument data only allows type inference to proceed in the body of the function so that, for example, the variable may be used as a data-only argument to a built-in function.

But, once the Stan model is written and compiles, the type of all parameters is known, isn’t it? Does omitting the data qualifier then (potentially) cost performance?

andrjohns · November 14, 2022, 8:00am

Yep, the data qualifier determines whether the parameter is transpiled to a double or a var<double>, which then determines whether gradients need to be calculated during the function call. So a data real will be more performant than a real if you don’t need the gradients

nhuurre · November 14, 2022, 9:13am

I don’t think the data qualifier makes a difference to performance.

real fn(real v, data real d, real v2) {
    real x = v + d;
    return x / v2;
}

generates C++

template <typename T0__, typename T2__,
          stan::require_all_t<stan::is_stan_scalar<T0__>,
                              stan::is_stan_scalar<T2__>>* = nullptr>
  stan::promote_args_t<T0__, T2__>
  fn(const T0__& v, const double& d, const T2__& v2, std::ostream* pstream__) {
    using local_scalar_t__ = stan::promote_args_t<T0__, T2__>;
    int current_statement__ = 0; 
    static constexpr bool propto__ = true;
    (void) propto__;
    local_scalar_t__ DUMMY_VAR__(std::numeric_limits<double>::quiet_NaN());
    (void) DUMMY_VAR__;  // suppress unused var warning
    try {
      local_scalar_t__ x = DUMMY_VAR__;
      current_statement__ = 3;
      x = (v + d);
      current_statement__ = 4;
      return (x / v2);
    } catch (const std::exception& e) {
      stan::lang::rethrow_located(e, locations_array__[current_statement__]);
    }
    }

It doesn’t use var<double> explicitly but every parameter gets a template that can be deduced to be either double or var<double> at the call site.

andrjohns · November 14, 2022, 9:14am

Ah I see, so performance will be the same, it just enables stricter type-checking

rok_cesnovar · November 14, 2022, 9:25am

There will be a performance difference in the case where you generate local variables from expressions that use only data inputs AND you use the experimental optimization. In this case the AD optimization is triggered (this used to be part of O1 but was later moved to experimental).

function(data real a, data real b) {
    real c = a + b;
    ...
}

without any optimization, c is regarded as a “parameter” - the gradient will be computed wrt to the intermediate variable. Though this case can probably always be avoided by precomputing the data-only expressions, which is probably more performant anyways.

Niko · November 14, 2022, 10:52am

If I could mark two posts as the joint solution, I would. Thanks to all of you!

Topic		Replies	Views
Using a lot of functions in Stan program General	8	896	August 1, 2020
Is there a performance hit when using user-defined functions? General	2	618	March 12, 2018
User-defined function causing Stan program to crash General	7	671	May 24, 2018
The var keyword: should Stan support local variable type inference Developers stanc , language	1	526	August 12, 2021
Can a function use data? General stanc	2	650	May 30, 2017

User defined functions: data qualifier

Related topics