In hierarchical modelling scenarios we often have an expression taking the dot product of each row (or column) of a data matrix and a parameter vector:

```
data{
int r ;
int c ;
matrix[r,c] X;
... // other data
}
model{
vector[c] beta ;
... // other parameters
}
model{
... // priors
X_dot_beta = rows_dot_product(X,beta) ;
... // more model, using X_dot_beta in likelihood
}
```

Often X consists of columns of so-called “contrasts”, often consisting of the values -1, 0, or 1. With such matrices, the use of multiplication in the dot-product simply serves to “keep” the corresponding beta for contribution to the sum if the value in the column is 1, ignore the corresponding beta when computing the sum if the value in the column is 0, and “keep but flip the sign” if the value in the column is -1. While the dot-product is a useful shortcut to express things, computationally it’s very expensive, so it should be possible to achieve speedup by instead using either loops-and-logic or pre-computed indexes to achieve the same thing as what the multiplies achieve in the dot-product.

Furthermore, since there’s high redundancy in the columns of the X matrix, it should be possible to build up sums in a tree-like process to reduce the total number of addition operations.

Below is an example (in R code) of very inelegant implementation of both the multiply-avoidance and branched-sums. The standard rows_dot_product would compute r \times c multiplications then r \times c additions, which works out to **648** operations in the 81 \times 4 case of the demo below, while the branched-sums approach achieves an identical result with far fewer operations (**80**; I haven’t worked out the generalized formula yet for expected number of ops in terms of r and c, but there surely is one).

Obviously there’s a lot of overhead in terms of loops and caching of intermediate results, so I haven’t bothered trying to implement in Stan for a performance comparison, but I wanted to post here in case the more CS-savvy might see a more elegant solution to achieve the same effect. Thoughts?

```
library(tidyverse)
# make data
(
expand_grid(
a = c(-1,0,1)
, b = c(-1,0,1)
, c = c(-1,0,1)
, d = c(-1,0,1)
)
%>% as.matrix()
) ->
X
#example parameter vector
beta = rnorm(ncol(X))
# compute the standard row-wise dot-product
expected_dot_out = t(X%*%beta)
#precompute some indices
X_tmp_xrow_list = list()
for(i in 1:ncol(X)){
(
X[,1:i]
%>% as_tibble()
%>% mutate(xrow=1:n())
) ->
X_with_xrow
(
X_with_xrow
%>% group_by_at(.vars=vars(-xrow))
%>% summarise(
first_xrow = xrow[1]
, .groups = 'drop'
)
%>% mutate(
xrow_in_X_tmp = 1:n()
)
) ->
tmp
(
X_with_xrow
%>% select(-xrow)
%>% left_join(tmp)
%>% nest(xrow_in_X_tmp=xrow_in_X_tmp)
) ->
X_tmp_xrow_list[[i]]
}
# helper function for the no-multiply dot product
dot_like2 = function(X,Z,beta){
out = Z
for(i in 1:length(X)){
if(X[i]!=0){
if(X[i]>0){
out[i] = out[i] + beta
}else{
out[i] = out[i] - beta
}
add_count <<- add_count + 1
}
}
return(out)
}
#initialize a count of additions performed
add_count = 0
Z = rep(0,nrow(X))
#loop over columns to do branched sum
for(i in 1:ncol(X)){
X_tmp = X[X_tmp_xrow_list[[i]]$first_xrow,1]
if(ncol(X)>1){
X = as.matrix(X[,2:ncol(X)])
}
Z_tmp = Z[X_tmp_xrow_list[[i]]$first_xrow]
Z_new = dot_like2(X_tmp,Z_tmp,beta[i])
Z = Z_new[unlist(X_tmp_xrow_list[[i]]$xrow_in_X_tmp)]
}
print(add_count) #80
all(near(Z,expected_dot_out)) #TRUE
```