In section 9.1, the manual shows an example of vectorization for a single variable linear regression
data {
int<lower=0> N;
vector[N] x;
vector[N] y;
}
parameters {
real alpha;
real beta;
real<lower=0> sigma;
} model {
y ~ normal(alpha + beta * x, sigma);
}
where y can also be written in unvectorized form
for (n in 1:N)
y[n] ~ normal(alpha + beta * x[n], sigma);
The advantage is the vectorized form is apparently faster, which is what I am trying to do.
for my case, its multivariate and I use a custom log likelihood
data {
int N; // number of users
int T; // number of days in campaign
int K; // number of predictors
int y[N, T]; // outcome
row_vector[K] z[N, T];
real alpha_tilda;
}
parameters {
vector<lower=0>[K] gamma;
}
model {
gamma ~ normal(0, 0.1);
for(n in 1:N) {
y[n] ~ iLogLikelihood(T, alpha_tilda, gamma, z[n]);
}
}
Is this also vectorizable?
I don’t think it’s automatic for custom likelihoods.
You’ll have to provide a custom implementation for vector types.
By the way loops in Stan are fast (they convert straight over to C++ loops). Where vectorization is handy is for autodiff. If your likelihood has repeated expensive operations that can be shared between likelihood evaluations, then you’ll probably benefit from writing a big custom likelihood. If not, then the benefit is mostly cosmetic.
I thought it would make a significant difference in speed due to the manual text
“In addition to being more concise, the vectorized form is much faster”
There are a couple things.
If there’s shared computation, vector versions of functions can be faster (this happens with multivariate normals and how the matrix solves are handled).
If you can replace loops and such with matrix/matrix or matrix/vector ops you’ll be better off because the autodiff has been optimized for those (less autodiff variables are created, and the necessary derivatives are hard coded in – check https://arxiv.org/abs/1509.07164 for more details).
Otherwise it’s probly cosmetic. Go ahead and try it if it’s not too hard, just saying that basic Stan loops are pretty fast as is.
That is correct as written. The chapter on efficiency tries to explain why, but I can summarize:

vectorized builtin operations cut down on the expression graph size and at the same time replace virtual function calls (pointer chasing, require branch prediction) with regular function calls

you can avoid recomputing shared operations
These won’t hold for your own code unless you’re careful to compute and reuse shared expressions. And even so, you can’t write custom derivatives to really unfold everything here.