Vectorization of exp and log

Louis · March 14, 2020, 10:50am

Hi,

I am not sure if this is the right category to ask, apologies if not. In the Stan Functions Reference it is written that vectorizations for exp and log are defined elementwise and thus provide only minimal efficiency speedups. I understand from this that such vectorizations are only for the convenience of not having to write a loop but they do not involve SIMD instructions, which is how I usually understand the term vectorization. Is my understanding correct? For example, in Matlab writing “y = exp(x)” is dramatically faster than looping over each element in ‘x’. Is there a reason for not vectorizing these functions in Stan and are there any plans to do it?

Thanks,
Louis

bgoodri · March 15, 2020, 5:18am

Correct. Many compiler optimizations are for double, float, etc. whereas in Stan we mostly have vectors of our custom var type in order to do the autodifferentiation.

avehtari · March 15, 2020, 3:43pm

This is because Matlab has much higher overhead in loops than Stan. I haven’t been following Matlab for a few years, but at that time there was not yet SIMD used and it was just the extra overhead due to memory use, type checks etc happening in every iteration of for loop and only once in the vectorized form. Term vectorization is much older than use of SIMD instructions. In Stan using the array/vector/matrix argument is not faster unless some computations can be done only once, e.g. the common term in lpdfs and lpmfs. There is a thread somewhere discussing potential additional speed-up from SIMDs.

andrjohns · March 16, 2020, 3:40am

As of version 2.22 SIMD vectorisation is used for exp and log (among others) for vectors/row-vectors/matrices of doubles (i.e., variables declared in the transformed data and generated quantities blocks, or passed in as data). These vectorised versions aren’t (directly) called when autodiff variables (i.e. variables declared in the parameters block) are used, since the both the value and the derivative for each variable needs to be calculated, but are used as part of deriving these.

Note that this vectorisation isn’t used for arrays, like when calling y = exp(x) for the following types:

real x[N];
vector[N] x[M];
matrix[N,M] x[J];

But this is being implemented at the moment, and should be available in the next release

Bob_Carpenter · March 19, 2020, 7:40pm

My bad in introducing that term in the first place. I wasn’t aware of its tighter usage in SIMD contexts. I should’ve used “elementwise” for what I meant by “vectorize”.

Aside from semantics, as @bgoodri outlined, the problem we have is that any SIMD speedups from sse, avx, etc. aren’t going to be the game changers they’d be if our data was originally just in contiguous memory. Instead, we have to unpack our structures into sequences from pointers into potentially non-contiguous memory. After that, we can call Eigen’s SIMD versions as @andrjohns pointed out. I’m not sure how much of a speedup that produced end to end on applying exp() to a sequence.

The other thing we can do, but aren’t doing yet, is cut down on virtual function calls. If vectorization were implemented with adjoint-jacobian-apply, we could reduce everything to a single chain() call for each autodiff variable.

Louis · March 19, 2020, 9:03pm

Thank you all for your replies and for clarifying. The reason I asked about vectorization of log() and exp() is that I have a model where I define a density that involves looping over a vector and running pow(x,y) on each element of the vector. The model spends quite a bit of time there so I wondered whether the model would benefit from removing the loop by writing exp(y*log(x)). From what you are saying, this obviously would not bring any improvement in performance, and I suspect it could in fact be slower than the loop. Thanks again.

bgoodri · March 20, 2020, 1:54am

That — or at least if you do elementwise multiplication with .* instead of * — requires three loops in the C++ code. It is essentially equivalent to

Eigen::Matrix<var, -1, 1> log_x(N), ylog_x(N), elog_x(N);
for (int n = 0; n < N; n++) log_x[n] = log(x[n]);
for (int n = 0; n < N; n++) ylog_x[n] = y[n] * log_x[n];
for (int n = 0; n < N; n++) elog_x[n] = exp(ylog_x[n]);

Topic		Replies	Views
Element wise power function for array or matrix Modeling	6	3023	November 8, 2018
How to vectorize a for loop in Stan? Modeling	2	360	April 6, 2023
Stan SIMD & Performance Algorithms	23	4486	January 23, 2020
Attempt at vectorizing mixture results in a slight reduction of speed General	1	51	March 18, 2025
About computation speed difference between multi_normal_lpdf and lpdf matrix operation General specification	2	525	July 30, 2020

Vectorization of exp and log

Related topics