Just thought I’d share a performance tip that could yield 2-5x speedups if you have a model where there are repeated observations for each of the unique combinations of predictors.

Attached is a demo of a hierarchical model with a binomial outcome and two 2-level predictors, one “within-subjects” and one “between-subjects”; the code would work for more predictors or levels therein, but I thought I’d keep it simple for this example. The speedup should also work for non-hierarchical models; I just happened to have a demo of the hierarchical version mostly coded already.

The “slower” model is pretty much identical to that used in the manual for the 8-schools data in section 9.13 (with the exception of variable names & binomial outcome), while the “faster” model is differentiated by slightly different data variables and replacement of the line:

```
y ~ bernoulli_logit( rows_dot_product( s_vals[s] , w ) ) ;
```

with:

```
{
// mu: condition-by-subject values for the latent mean
matrix[rows_w,n_s] mu ;
for(i in 1:n_s){
for(j in 1:rows_w){
mu[j,i] = dot_product(s_vals[i],w[j]) ;
}
}
y ~ bernoulli_logit( to_vector(mu)[obs_index_in_flat_mu] ) ;
}
```

So, the trick is to pre-compute in R the unique entries in the within-subject design matrix, so that you can then have Stan compute the unique entries in the subject-by-condition matrix (`mu`

). You also pre-compute (again, in R) the index into a flattened version of mu of each observation so you can have a final vectorized call to bernoulli_logit().

Hopefully the R code is well-commented enough to be clear, but let me know if you have any questions.

p.s. There’s actually still some redundant computation in the between-subjects design matrix that you could also apply this trick to, but I usually have much less redundancy here than in the within-subject matrix, so I haven’t gotten around to coding that one as more efficient yet.

binomial_mixed_demo.R (6.1 KB)

generate_data.R (2.0 KB)

binomial_mixed_slower.stan (1.9 KB)

binomial_mixed_faster.stan (2.1 KB)