Hi all,
It’s my first post here, but I hope it you will find it somewhat valuable. I’ve recently been trying to optimize a model that performs a lot of dot_product operations:
transformed parameters {
...
for(n in 1:N)
beta_x[n] = dot_product(beta[conversion_type[n], vertical[n]], x[n]);
}
model {
y ~ student_t(nu, beta_x, sigma);
}
and noticed how expensive dot_product operation is in stan. Profiling it with perf and cmdstan I got the following results:
+ 22.33% R file26bf7e2571d5fa.so [.] stan::math::dot_product<stan::math::var, -1, 1, double, -1, 1>
+ 11.76% R libjemalloc.so.2 [.] malloc
+ 11.43% R file26bf7e2571d5fa.so [.] stan::math::check_range
Being quite surprised that dot_product runs malloc, I dug into the code and found that what’s really multiplied is an Eigen::VectorXd<stan::math::var>
times Eigen::VectorXd<double>
and that the former one is being rewritten into the latter type for every dot_product operation.
It seems that a huge performance improvement (especially given that tls access is slower for dynamically linked binaries and hence rewritting vectors in rstan is even slower than in cmdstan) can be achieved by storing vectors of stan::math::var
s as Eigen::VectorXd<double>
s of values and derivatives.
Has anyone ever looked into it? I’ll be happy to look into it more deeply, but if you already know that it’s impossible to implement, I won’t need to waste my time trying.