The final optimized model is:

```
...
parameters {
matrix[K, J] z;
cholesky_factor_corr[K] L_Omega;
vector<lower=0,upper=pi()/2>[K] tau_unif;
matrix[L, K] gamma; // group coeffs
real<lower=0> sigma; // prediction error scale
}
transformed parameters {
matrix[J, K] beta;
vector<lower=0>[K] tau; // prior scale
for (k in 1:K) tau[k] = 2.5 * tan(tau_unif[k]);
beta = u * gamma + (diag_pre_multiply(tau,L_Omega) * z)';
}
model {
to_vector(z) ~ std_normal();
L_Omega ~ lkj_corr_cholesky(2);
to_vector(gamma) ~ normal(0, 5);
y ~ normal(rows_dot_product(beta[jj] , x), sigma);
}
```

That transpose in the creation of `beta`

has always annoyed me. Partly because it’s easy to miss (solved if we update the docs to use `transpose()`

instead), but also partly because it makes me lose track of the shape of things; it’s weird to have `z`

as a K \times J matrix but `beta`

is J \times K.

Isn’t it more straight-forward to do:

```
model{
...
matrix[J, K] z;
...
}
transformed parameters {
...
beta = u * gamma + z*diag_post_multiply(L_Omega,tau) ;
...
}
...
```

Here’s some R code that (I think) proves it’s equivalent this way:

```
diag_pre_multiply = function(v,m){
diag(v)%*%m
}
diag_post_multiply = function(m,v){
m%*%diag(v)
}
K = 3
J = 5
tau = 1:K #K arbitrary values
L_Omega = matrix(1,ncol=K,nrow=K) #KxK matrix of arbitrary values
z = matrix(rnorm(K*J),nrow=K,ncol=J) #just some random data
tz = t(z) #transpose of z
#the current approach:
beta1 = t(diag_pre_multiply(v=tau,m=L_Omega) %*% z)
#mike's suggested alternate:
beta2 = tz %*% diag_post_multiply(m=L_Omega,v=tau)
#and they're equivalent:
all(beta1==beta2)
```