Efficiency of design matrix multiplication vs. range indexing in the framework of a hierarchical model

In the framework of a hierarchical model I defined a transformed parameter in the form
p = p_0 + D * \theta * \sigma_{\theta},
where D is a ‘design matrix’ of dimension (n,d), for number of data points n and number of groups d, encoding the group association, and where \theta is a vector of length d for the normalised deviations of p from p_0 per group and where \sigma_{\theta} is the standard deviation of the deviation of p from p_0 (i.e. p \sim \mathcal{N}(p_0, \sigma_{\theta})).

Alternatively, I can write
p = p_0 + \theta[I] * \sigma_{\theta} ,
where I is an array of indices of length n encoding the group association.
I checked that stan correctly broadcasts the vector of \theta's.

Is the alternative more efficient, or can stan deal quickly which the sparse matrix in the first option?

The indexing thing is easier if you are writing your own Stan code, and possibly faster, particularly if all the non-zero elements of the design matrix would be 1. There is a csr_matrix_times_vector function in Stan for this sort of situation, but it isn’t as fast as what one might hope.

1 Like

Great. Thank you very much for this very fast answer! Then, I will change this to optimise the code.