I just speeded up a model by a factor of 10x by replacing a matrix-vector product with a loop which only processes the non-zero elements of the matrix. The matrix was a design matrix (thus the matrix was data for Stan) for the random effects part of a model. I assume that this is the same situation for rstanarm.
So it seems to be a major problem as the AD stack gets inflated by all those zero times whatever product.
Do we have a good solution for the moment or on the horizon? looping will do, but writing matrix expressions is a lot more expressive for the model.
Sorry if this was discussed already and I missed it.