Basic vectorization question

I have a model that is not vectorized, that is I loop over data points in the model block, compute some datapoint-specific values and parameter combinations, and lastly add the datapoint’s contribution to the likelihood. This works fine for the moment, but it won’t when I increase the size of the data 10- or 100-fold, which is my endgame.

I am able to “weakly” vectorize this model in the sense that I can still do the relevant pre-computation of datapoint-specific quantities in a loop in the model block, but then bring my target += statement outside of the loop, so that it is only executed once per block execution. The question is: is this worth it? Such a rewrite carries its own costs.

A toy version of what I’m doing looks like this:

data {
   n_points<lower=1> int;
   n_entities<lower=1> int;
   outcome int[n_points];
   entity<lower=1, upper=n_entity> int[n_points];

   ent real[n_entities];

model {
  ...some priors...

  for i in (1:n_points)
     target += some_scalar_lpdf(outcome[i] | ent[entity[i]]);

and the rewritten model block would then be

model {
   relevant_ents real[n_points];

   ...some priors...

   for i in (1:n_points){
      relevant_ents[i] = ent[entity[i]];
   target += vectorized_lpdf(outcome | relevant_ents);

The question is again, would we expect the refactored code to be significantly faster.

Yes, if you are using one of the _lpdfs in Stan that has analytic derivatives.

1 Like

Excellent, thank you.

I’d suggest rewriting as

outcome ~ some_scalar(ent[entity]);

If it’s a built-in lpdf, it will drop constants that aren’t needed and it avoids an explicit intermediate loop so the code’s easier to follow.

The trick is that ent[entity] is just your relevant_ents, because ent[entity][i] = ent[entity[i]] by definition.

That’s cute, I wasn’t aware of the possibility of indexing with a list like that. Thanks.