Conditions for SoA memory representation

I have a large model where 90% of the parameters is packed into a matrix eta. I’d like it to be represented as struct of arrays, hoping for faster execution. The outline of the program is:

data {
  vector[N] y;
}

parameters {
  matrix[K, M]     eta;
  matrix[U, V]     B;
  real<lower=0.0>  sigma;
}

model {
  matrix[K, K] L = something;
  matrix[K, M] nu = L * eta;
  vector[N] mu;

  for (i in 1:N)  {
     mu[i] = nu[f(i)] + B[g(i)];
  }

  to_vector(eta) ~ std_normal();
  to_vector(B) ~ std_normal();
  sigma ~ std_normal();

  y ~ normal(mu, sigma);
}

I expected eta to be SoA and B to be AoS (since it is indexed), but it is the opposite. Any ideas why? I hope that this is enough detail. The real program is more complicated, but eta is used exactly like above, and not referenced anywhere else.

Edit: cmdstanr 0.9.0.

I think it somewhat depends on the exact construction of L (if it is forced to be AoS, then I believe so will eta so they can be multiplied) but I’m not sure. @stevebronder might have a better idea, though it would help if you are able to share (more of) the full model

1 Like

Thank you, Brian. L is the Cholesky factor of a covariance matrix, built from a cholesky_factor_corr and an array[] real. I’m going try to get it promoted to SoA, although now you got me thinking that nu being indexed, which is unavoidable, may force AoS on eta anyway. I’m going to experiment with simple programs like the one above and report back. Any insights in the meantime are very welcome of course.

Insights obtained by modifying the program above:

  • Multiplication with * indeed seems to yield SoA only if both inputs are SoA. When one is AoS, the other is turned back into AoS as well.
  • When L = diag_pre_multiply(omega, Lc) withomega a SoA vector and Lc an AoS cholesky_factor_corr, the result is SoA. So this specialised product function, and maybe some others, can yield SoA even if some inputs are AoS.
  • Putting an LKJ prior on an cholesky_factor_corr turns it into an AoS.
  • Accessing individual elements of nu in a loop like above makes it an AoS, and this too propagates upstream, to L and eta.
  • However, slicing nu, that is pulling a column or a row at a time, preserves the SoA layout. (This is very good news to me, because I can rewrite my loop in this way.)