How to vectorize for multi-variate regression

bhomass · August 15, 2017, 5:48am

In section 9.1, the manual shows an example of vectorization for a single variable linear regression

data {
  int<lower=0> N;
  vector[N] x;
  vector[N] y;
}
parameters {
  real alpha;
  real beta;
  real<lower=0> sigma;
} model {
  y ~ normal(alpha + beta * x, sigma);
}

where y can also be written in unvectorized form

for (n in 1:N)
        y[n] ~ normal(alpha + beta * x[n], sigma);

The advantage is the vectorized form is apparently faster, which is what I am trying to do.

for my case, its multi-variate and I use a custom log likelihood

data {
	int N; // number of users
	int T; // number of days in campaign
	int K; // number of predictors
	
	int y[N, T]; // outcome
	row_vector[K] z[N, T]; 

	real alpha_tilda;
}
parameters {
	vector<lower=0>[K] gamma;
}

model {
	gamma ~ normal(0, 0.1);
	for(n in 1:N) {
		y[n] ~ iLogLikelihood(T, alpha_tilda, gamma, z[n]);
	}
}

Is this also vectorizable?

bbbales2 · August 15, 2017, 1:29pm

I don’t think it’s automatic for custom likelihoods.

You’ll have to provide a custom implementation for vector types.

By the way loops in Stan are fast (they convert straight over to C++ loops). Where vectorization is handy is for autodiff. If your likelihood has repeated expensive operations that can be shared between likelihood evaluations, then you’ll probably benefit from writing a big custom likelihood. If not, then the benefit is mostly cosmetic.

bhomass · August 15, 2017, 5:28pm

I thought it would make a significant difference in speed due to the manual text

“In addition to being more concise, the vectorized form is much faster”

bbbales2 · August 15, 2017, 6:48pm

There are a couple things.

If there’s shared computation, vector versions of functions can be faster (this happens with multivariate normals and how the matrix solves are handled).

If you can replace loops and such with matrix/matrix or matrix/vector ops you’ll be better off because the autodiff has been optimized for those (less autodiff variables are created, and the necessary derivatives are hard coded in – check https://arxiv.org/abs/1509.07164 for more details).

Otherwise it’s probly cosmetic. Go ahead and try it if it’s not too hard, just saying that basic Stan loops are pretty fast as is.

Bob_Carpenter · August 17, 2017, 1:38am

That is correct as written. The chapter on efficiency tries to explain why, but I can summarize:

vectorized built-in operations cut down on the expression graph size and at the same time replace virtual function calls (pointer chasing, require branch prediction) with regular function calls
you can avoid recomputing shared operations

These won’t hold for your own code unless you’re careful to compute and reuse shared expressions. And even so, you can’t write custom derivatives to really unfold everything here.

Topic		Replies	Views
How to vectorize this code? Modeling	3	483	September 3, 2020
Question about custom function (vectorization possible?) Modeling techniques	3	412	September 15, 2021
Vectorization of exp and log Algorithms	6	1349	March 20, 2020
Vectorizing a list Modeling	8	804	October 10, 2017
Rstan vectorized modelling efficiency and its components setup Modeling	2	554	March 22, 2018

How to vectorize for multi-variate regression

Related topics