I have a model that is not vectorized, that is I loop over data points in the model
block, compute some datapoint-specific values and parameter combinations, and lastly add the datapoint’s contribution to the likelihood. This works fine for the moment, but it won’t when I increase the size of the data 10- or 100-fold, which is my endgame.
I am able to “weakly” vectorize this model in the sense that I can still do the relevant pre-computation of datapoint-specific quantities in a loop in the model
block, but then bring my target +=
statement outside of the loop, so that it is only executed once per block execution. The question is: is this worth it? Such a rewrite carries its own costs.
A toy version of what I’m doing looks like this:
data {
n_points<lower=1> int;
n_entities<lower=1> int;
outcome int[n_points];
entity<lower=1, upper=n_entity> int[n_points];
}
parameters{
ent real[n_entities];
}
model {
...some priors...
for i in (1:n_points)
target += some_scalar_lpdf(outcome[i] | ent[entity[i]]);
}
and the rewritten model block would then be
model {
relevant_ents real[n_points];
...some priors...
for i in (1:n_points){
relevant_ents[i] = ent[entity[i]];
}
target += vectorized_lpdf(outcome | relevant_ents);
}
The question is again, would we expect the refactored code to be significantly faster.