Hi Everyone,
I am trying to convert all the for loops into vectorized form so that the calculations are done faster.
But, when I try to change the for loops to vectorized form in generated quantities section, Im getting a dimension mismatch in assignment error.
My question is, vectorization is not applicable to generated quantities section?
Thanks in advance,
2 Likes
Could you please post some code so we can take a look at how you’re currently attempting to do this?
1 Like
Yes, it is possible to vectorize rng functions, but as the error tells you,
you need to ensure you have matching dimensions on both sides of the rng function.
For example, given the data
data {
int N;
vector[N] y;
}
You can create the following vector of ones which you can then use on the generated quantities
block to ensure your functions have the correct dimensions.
transformed data {
vector[N] ones_N = rep_vector(1, N);
}
parameters {
real <lower = 0> lambda;
}
model {
lambda ~ exponential(1);
target += exponential_lpdf(y | lambda);
}
generated quantities {
real y_rep[N] = exponential_rng(ones_N * lambda);
}
Does, avoiding looping over every data point.
Hope it helps!
Hi@tiagocc
I will try to implement this and update tomorrow If I’m able to make my model work. Thanks for the tip
Hi @tiagocc,
Can you show an example of using the ones for normal_rng function?
1 Like
Sorry it took so long to respond.
I would do something like:
transformed data {
int N = 10;
vector[N] ones_N = rep_vector(1, N);
}
generated quantities {
real yrep [N] = normal_rng(ones_N * 1, ones_N * 10);
}
1 Like
a little clarification on the above example - which is entirely correct -
as this issue just bubbled up again - Minor typos in Stan user guide · Issue #208 · stan-dev/docs · GitHub
intuitively, one would want yrep
to be of type vector[N]
just as y
is.
however, this doesn’t compile.
the definition of normal_rng
in the Stan Functions Reference Manual - Stan Functions Reference
says:
R
normal_rng
(reals mu, reals sigma)
Generate a normal variate with location mu and scale sigma; may only be used in transformed data and generated quantities blocks. For a description of argument and return types, see section vectorized PRNG functions.
that definition uses type R
- in the prng-vectorization section, 10.8.3.3 Return type - it says:
The result of a vectorized PRNG function depends on the size of the arguments and the distribution’s support. If all arguments are scalars, then the return type is a scalar. For a continuous distribution, if there are any non-scalar arguments, the return type is a real array ( real[]
) matching the size of any of the non-scalar arguments, as all non-scalar arguments must have matching size. Discrete distributions return ints
and continuous distributions return reals
, each of appropriate size. The symbol R
denotes such a return type.
in short, the vectorized RNG function normal_rng
return scalars or int
or real
arrays. not matrix
, vector
, or row_vector
.
1 Like
That only helps if there’s autodiff inside the loop that is more efficient outside the loop. Loops themselves are very fast in Stan. That means this
real y[N];
for (n in 1:N)
y[n] ~ normal(mu, sigma);
is a lot slower than the vectorized form
y ~ normal(mu, sigma);
tghat doesnt’ carry over everywhere. For example,
real x[N];
real y[N] = exp(x);
isn’t any faster than
real x[N];
real y[N];
for (n in 1:N)
y[n] = exp(x[n]);
because all the vectorized exp
function does is a loop down in C++ just like the loop code.
Edit: There’s no autodiff in transformed parameters or generated quantities, so there won’t be a speed difference between loops or vectorized operations for those functions. There still will be a big difference between matrix operations and loops. You definitely want to call matrix operations where possible because they have lots of optimizations built in.