Tip for speeding up models with repeated observations

mike-lawrence · February 26, 2018, 9:00pm

Just thought I’d share a performance tip that could yield 2-5x speedups if you have a model where there are repeated observations for each of the unique combinations of predictors.

Attached is a demo of a hierarchical model with a binomial outcome and two 2-level predictors, one “within-subjects” and one “between-subjects”; the code would work for more predictors or levels therein, but I thought I’d keep it simple for this example. The speedup should also work for non-hierarchical models; I just happened to have a demo of the hierarchical version mostly coded already.

The “slower” model is pretty much identical to that used in the manual for the 8-schools data in section 9.13 (with the exception of variable names & binomial outcome), while the “faster” model is differentiated by slightly different data variables and replacement of the line:

y ~ bernoulli_logit( rows_dot_product( s_vals[s] , w ) ) ;

with:

{
  // mu: condition-by-subject values for the latent mean
  matrix[rows_w,n_s] mu ;
  for(i in 1:n_s){
    for(j in 1:rows_w){
      mu[j,i] = dot_product(s_vals[i],w[j]) ;
    }
  }
  y ~ bernoulli_logit( to_vector(mu)[obs_index_in_flat_mu] ) ;
}

So, the trick is to pre-compute in R the unique entries in the within-subject design matrix, so that you can then have Stan compute the unique entries in the subject-by-condition matrix (mu). You also pre-compute (again, in R) the index into a flattened version of mu of each observation so you can have a final vectorized call to bernoulli_logit().

Hopefully the R code is well-commented enough to be clear, but let me know if you have any questions.

p.s. There’s actually still some redundant computation in the between-subjects design matrix that you could also apply this trick to, but I usually have much less redundancy here than in the within-subject matrix, so I haven’t gotten around to coding that one as more efficient yet.

binomial_mixed_demo.R (6.1 KB)
generate_data.R (2.0 KB)
binomial_mixed_slower.stan (1.9 KB)
binomial_mixed_faster.stan (2.1 KB)

Bob_Carpenter · February 28, 2018, 10:53pm

Thanks for sharing. I discuss some simple cases in the chapter on efficiency under the heading “sufficient statistics” (here, just the repeated observation count).

You can also do this in transformed data within Stan. Not as convenient, but it leaves the interface to the Stan program cleaner.

As you have it, you could just compute a vector directly to avoid the copying by this:

int pos = 1;
for (i in ...)
  for (j in ...) {
    foo[pos] = ...
    pos += 1;
  }

It’ll be even faster if you reformulate this as matrix multiplication.

You can then also collect up the sufficient statistics on the y side. That is, the number of obs_index_in_flat_mu that you have with a given index. You only need a count of those in y, then it can be binomial_logit. That’s the speedup I talked about as I wasn’t assuming there’d be much overlap in the predictors.

mike-lawrence · March 1, 2018, 12:57am

Ah, yes, that’s an additional speedup that can be added for a binomial case like this (and one I keep forgetting about!). The speedup from avoiding redundant computations amidst repeated observations applies more generally to any outcome type.

Neat! I’ll try these additional tweaks and report back on the likely further speedups.

mike-lawrence · March 12, 2018, 2:09pm

Can you elaborate what you mean here?

Bob_Carpenter · March 12, 2018, 3:22pm

Yes. This operation

matrix[rows_w,n_s] mu ;
  for(i in 1:n_s){
    for(j in 1:rows_w){
      mu[j,i] = dot_product(s_vals[i],w[j]) ;
    }
  }

should be formulated as a matrix multiply w * s_vals, which will require some rearranging of the data types for s_vals or w or a copy into two matrixes.

mike-lawrence · March 12, 2018, 3:49pm

Ah, yes, defining/computing s_vals as:

// s_vals: subject-by-subject values
matrix[n_w, n_s] s_vals ;
s_vals = transpose(b * coef + transpose(diag_pre_multiply(sds,cor_helper) * multi_normal_helper) );

one can then simply do:

y ~ bernoulli_logit( to_vector(w*s_vals)[obs_index_in_flat_mu] ) ;

in the model block. Achieves about 10% speedup over the non-matrix multiplication version.

Topic		Replies	Views
Improving Speed for hierarchical logit/probit models Modeling rstan , performance , hierarchical-model	2	718	October 26, 2020
[Case-study preview] Speeding up Stan by reducing redundant computation Publicity performance	8	2114	June 6, 2020
Help speeding up a slow hierarchical model? Modeling	3	1905	July 19, 2017
Try to find a better way to speed up a hierarchical logistic model and reduce_sum only helped a little Modeling specification , performance , paralellization	5	611	May 18, 2021
Improving Hierarchical Model Efficiency Modeling rstan , fitting-issues , specification	7	529	February 14, 2023

Tip for speeding up models with repeated observations

Related topics