[Case-study preview] Speeding up Stan by reducing redundant computation

Just had a thought while procrastinating on the prose: design matrices tend to have high regularity within columns (ex. intercept column, effect columns where entries are simply *-1 of each other) so I wonder if that regularity could be used to further reduce computation, or whether dot_product() is already super optimized that breaking it apart and doing the sums explicitly would just slow things down.