Basic vectorization question

huffyhenry · August 16, 2018, 9:27am

I have a model that is not vectorized, that is I loop over data points in the model block, compute some datapoint-specific values and parameter combinations, and lastly add the datapoint’s contribution to the likelihood. This works fine for the moment, but it won’t when I increase the size of the data 10- or 100-fold, which is my endgame.

I am able to “weakly” vectorize this model in the sense that I can still do the relevant pre-computation of datapoint-specific quantities in a loop in the model block, but then bring my target += statement outside of the loop, so that it is only executed once per block execution. The question is: is this worth it? Such a rewrite carries its own costs.

A toy version of what I’m doing looks like this:

data {
   n_points<lower=1> int;
   n_entities<lower=1> int;
   outcome int[n_points];
   entity<lower=1, upper=n_entity> int[n_points];
}

parameters{
   ent real[n_entities];
}

model {
  ...some priors...

  for i in (1:n_points)
     target += some_scalar_lpdf(outcome[i] | ent[entity[i]]);
}

and the rewritten model block would then be

model {
   relevant_ents real[n_points];

   ...some priors...

   for i in (1:n_points){
      relevant_ents[i] = ent[entity[i]];
   }
   target += vectorized_lpdf(outcome | relevant_ents);
}

The question is again, would we expect the refactored code to be significantly faster.

bgoodri · August 16, 2018, 12:57pm

Yes, if you are using one of the _lpdfs in Stan that has analytic derivatives.

huffyhenry · August 17, 2018, 4:42am

Excellent, thank you.

Bob_Carpenter · August 22, 2018, 1:23pm

I’d suggest rewriting as

outcome ~ some_scalar(ent[entity]);

If it’s a built-in lpdf, it will drop constants that aren’t needed and it avoids an explicit intermediate loop so the code’s easier to follow.

The trick is that ent[entity] is just your relevant_ents, because ent[entity][i] = ent[entity[i]] by definition.

huffyhenry · August 22, 2018, 3:54pm

That’s cute, I wasn’t aware of the possibility of indexing with a list like that. Thanks.

Topic		Replies	Views
How to vectorize for multi-variate regression General stanc	4	1090	August 17, 2017
How to vectorize this code? Modeling	3	487	September 3, 2020
How to make a model more efficient Modeling	8	2193	November 27, 2017
Vectorizing a list Modeling	8	823	October 10, 2017
Advantages of vectorization and using built-in _l*df functions Modeling	2	530	July 17, 2018

Basic vectorization question

Related topics