Vectorizing a list


#1

Hello community,

I would like to speed up a model. I have a data frame that is not square, that I can easily represent by a list (number of columns and rows are not fixed, but decided as input)

image

Now, the way I am sampling

s1 ~ normal()
s2 ~ normal()

sN ~ normal()

is building a linear array and a map array where I map every element with a parameter (it could be an vector of size N)

This is really slow because is not vectorized. Is there a solution/trick to vectorize over columns (in my example). My understanding is that lists/tuples haven’t been implemented yet.

Thanks a lot.


#2

Probably a long shot or I might be misunderstanding your problem but could you represent your data as

g  s  value
1  1  0.416
1  2  0.298
...
2  1  0.276
2  2  0.791
...

and then in Stan you could have for instance:

value ~ normal(mu[s] + mu[g], sigma)

It probably depends on the other components of your model whether this will work or not.


#3

Thanks!

I think you got it, more or less.

I can have a mapping s = int[…] and then

s  value
1  0.416
2  0.298
...
1  0.276
2  0.791
...

value ~ normal(mu[map], sigma[map]);

Would this be fully vectorized?

Thanks


#4

I think that is vectorized, yes. Hopefully, the Stan team will correct me when I am wrong.


#5

There is something about the vectorization process of dropping a constant (?) of different distributions

normal_1
normal_2

That sounds wrong. Considered that I don’t know much about fine details about the vectorization process.

Bw.


#6

The details are in the Stan Math paper and I do understand some of the words that are being used in that paper.

https://arxiv.org/abs/1509.07164

As far as I understand it the most time consuming bit in running Stan is calculating the derivatives of the probability functions. The calculation are being done by going over an expression graph which represents the function and its derivative. When you have normal_1, normal_2, …, normal_n in your model, the expression graph needs to be constructed n times. When you have normal(vector_mu, vector_sd), the expression graph only needs to be constructed once.


#7

This is essentially right.

What happens is that we

  • share repeated computations from a scalar being broadcast; so if you have just a scalar sigma, you only need to compute log(sigma) once, and

  • reduce the size of the expression graph, which has an even bigger speedup on derivatives, which are the true bottleneck; this also has the pleasant property of converting a lot of virtual operations into statically locatable operations, which is also a big win.

We also drop any additive constants in the log density (they’re not needed for sampling).


#8

Following this I deduct that:

replacing

~ normal(mu_1, sigma_1);
~ normal(mu_2, sigma_2);

with a mapping array (e.g., [1,1,2,2,1,1,2,1,2,1]) such that I can do

~ normal(mu[mapping_array], sigma[mapping_array]);

Gives no speedup

Is this correct? (I hope I could explain my self clearly)

Thanks


#9

It will give you a speedup even when both the location and scale vary. The speedup comes from reducing the number of virtual function calls during autodiff and the reduced memory allocation. It’s not as big a speedup as when the scale is a simple scalar (rather than a container), because the logarithm is the most compute intensive part of the normal log density.