How to vectorize for multi-variate regression

In section 9.1, the manual shows an example of vectorization for a single variable linear regression

data {
  int<lower=0> N;
  vector[N] x;
  vector[N] y;
}
parameters {
  real alpha;
  real beta;
  real<lower=0> sigma;
} model {
  y ~ normal(alpha + beta * x, sigma);
}

where y can also be written in unvectorized form

for (n in 1:N)
        y[n] ~ normal(alpha + beta * x[n], sigma);

The advantage is the vectorized form is apparently faster, which is what I am trying to do.

for my case, its multi-variate and I use a custom log likelihood

data {
	int N; // number of users
	int T; // number of days in campaign
	int K; // number of predictors
	
	int y[N, T]; // outcome
	row_vector[K] z[N, T]; 

	real alpha_tilda;
}
parameters {
	vector<lower=0>[K] gamma;
}

model {
	gamma ~ normal(0, 0.1);
	for(n in 1:N) {
		y[n] ~ iLogLikelihood(T, alpha_tilda, gamma, z[n]);
	}
}

Is this also vectorizable?

I don’t think it’s automatic for custom likelihoods.

You’ll have to provide a custom implementation for vector types.

By the way loops in Stan are fast (they convert straight over to C++ loops). Where vectorization is handy is for autodiff. If your likelihood has repeated expensive operations that can be shared between likelihood evaluations, then you’ll probably benefit from writing a big custom likelihood. If not, then the benefit is mostly cosmetic.

I thought it would make a significant difference in speed due to the manual text

“In addition to being more concise, the vectorized form is much faster”

There are a couple things.

If there’s shared computation, vector versions of functions can be faster (this happens with multivariate normals and how the matrix solves are handled).

If you can replace loops and such with matrix/matrix or matrix/vector ops you’ll be better off because the autodiff has been optimized for those (less autodiff variables are created, and the necessary derivatives are hard coded in – check https://arxiv.org/abs/1509.07164 for more details).

Otherwise it’s probly cosmetic. Go ahead and try it if it’s not too hard, just saying that basic Stan loops are pretty fast as is.

That is correct as written. The chapter on efficiency tries to explain why, but I can summarize:

  1. vectorized built-in operations cut down on the expression graph size and at the same time replace virtual function calls (pointer chasing, require branch prediction) with regular function calls

  2. you can avoid recomputing shared operations

These won’t hold for your own code unless you’re careful to compute and reuse shared expressions. And even so, you can’t write custom derivatives to really unfold everything here.