Applying Vectorization in generated quantities section

Hi Everyone,

I am trying to convert all the for loops into vectorized form so that the calculations are done faster.

But, when I try to change the for loops to vectorized form in generated quantities section, Im getting a dimension mismatch in assignment error.

My question is, vectorization is not applicable to generated quantities section?

Thanks in advance,

2 Likes

Could you please post some code so we can take a look at how you’re currently attempting to do this?

1 Like

Yes, it is possible to vectorize rng functions, but as the error tells you,

you need to ensure you have matching dimensions on both sides of the rng function.

For example, given the data

data {
int N;
vector[N] y;
}

You can create the following vector of ones which you can then use on the generated quantities block to ensure your functions have the correct dimensions.

transformed data {
vector[N] ones_N = rep_vector(1, N);
}
parameters {
real <lower = 0> lambda;
}

model {
lambda ~ exponential(1);
target += exponential_lpdf(y | lambda);
}

generated quantities {
real y_rep[N]  = exponential_rng(ones_N * lambda);
}

Does, avoiding looping over every data point.

Hope it helps!

Hi@tiagocc

I will try to implement this and update tomorrow If I’m able to make my model work. Thanks for the tip

Hi @tiagocc,

Can you show an example of using the ones for normal_rng function?

1 Like

Sorry it took so long to respond.

I would do something like:


transformed data {
  int N = 10;
  vector[N] ones_N = rep_vector(1, N);
}

generated quantities {
  real yrep [N] = normal_rng(ones_N * 1, ones_N * 10);  
}
1 Like

a little clarification on the above example - which is entirely correct -
as this issue just bubbled up again - Minor typos in Stan user guide · Issue #208 · stan-dev/docs · GitHub

intuitively, one would want yrep to be of type vector[N] just as y is.
however, this doesn’t compile.

the definition of normal_rng in the Stan Functions Reference Manual - Stan Functions Reference
says:

R normal_rng (reals mu, reals sigma)
Generate a normal variate with location mu and scale sigma; may only be used in transformed data and generated quantities blocks. For a description of argument and return types, see section vectorized PRNG functions.

that definition uses type R - in the prng-vectorization section, 10.8.3.3 Return type - it says:

The result of a vectorized PRNG function depends on the size of the arguments and the distribution’s support. If all arguments are scalars, then the return type is a scalar. For a continuous distribution, if there are any non-scalar arguments, the return type is a real array ( real[] ) matching the size of any of the non-scalar arguments, as all non-scalar arguments must have matching size. Discrete distributions return ints and continuous distributions return reals , each of appropriate size. The symbol R denotes such a return type.

in short, the vectorized RNG function normal_rng return scalars or int or real arrays. not matrix, vector, or row_vector.

1 Like

That only helps if there’s autodiff inside the loop that is more efficient outside the loop. Loops themselves are very fast in Stan. That means this

real y[N];
for (n in 1:N)
  y[n] ~ normal(mu, sigma);

is a lot slower than the vectorized form

y ~ normal(mu, sigma);

tghat doesnt’ carry over everywhere. For example,

real x[N];
real y[N] = exp(x);

isn’t any faster than

real x[N];
real y[N];
for (n in 1:N) 
  y[n] = exp(x[n]);

because all the vectorized exp function does is a loop down in C++ just like the loop code.

Edit: There’s no autodiff in transformed parameters or generated quantities, so there won’t be a speed difference between loops or vectorized operations for those functions. There still will be a big difference between matrix operations and loops. You definitely want to call matrix operations where possible because they have lots of optimizations built in.