Conditions for SoA memory representation

huffyhenry · July 8, 2025, 9:06pm

I have a large model where 90% of the parameters is packed into a matrix eta. I’d like it to be represented as struct of arrays, hoping for faster execution. The outline of the program is:

data {
  vector[N] y;
}

parameters {
  matrix[K, M]     eta;
  matrix[U, V]     B;
  real<lower=0.0>  sigma;
}

model {
  matrix[K, K] L = something;
  matrix[K, M] nu = L * eta;
  vector[N] mu;

  for (i in 1:N)  {
     mu[i] = nu[f(i)] + B[g(i)];
  }

  to_vector(eta) ~ std_normal();
  to_vector(B) ~ std_normal();
  sigma ~ std_normal();

  y ~ normal(mu, sigma);
}

I expected eta to be SoA and B to be AoS (since it is indexed), but it is the opposite. Any ideas why? I hope that this is enough detail. The real program is more complicated, but eta is used exactly like above, and not referenced anywhere else.

Edit: cmdstanr 0.9.0.

WardBrian · July 9, 2025, 1:30pm

I think it somewhat depends on the exact construction of L (if it is forced to be AoS, then I believe so will eta so they can be multiplied) but I’m not sure. @stevebronder might have a better idea, though it would help if you are able to share (more of) the full model

huffyhenry · July 10, 2025, 11:22am

Thank you, Brian. L is the Cholesky factor of a covariance matrix, built from a cholesky_factor_corr and an array[] real. I’m going try to get it promoted to SoA, although now you got me thinking that nu being indexed, which is unavoidable, may force AoS on eta anyway. I’m going to experiment with simple programs like the one above and report back. Any insights in the meantime are very welcome of course.

huffyhenry · July 10, 2025, 1:34pm

Insights obtained by modifying the program above:

Multiplication with * indeed seems to yield SoA only if both inputs are SoA. When one is AoS, the other is turned back into AoS as well.
When L = diag_pre_multiply(omega, Lc) withomega a SoA vector and Lc an AoS cholesky_factor_corr, the result is SoA. So this specialised product function, and maybe some others, can yield SoA even if some inputs are AoS.
Putting an LKJ prior on an cholesky_factor_corr turns it into an AoS.
Accessing individual elements of nu in a loop like above makes it an AoS, and this too propagates upstream, to L and eta.
However, slicing nu, that is pulling a column or a row at a time, preserves the SoA layout. (This is very good news to me, because I can rewrite my loop in this way.)

Topic		Replies	Views
30%-40% drop in sampling time using stanc O1 optimizations General	22	1734	September 1, 2022
Sparse matrix use cases (resolving declaration issues) Developers	22	3424	August 16, 2017
Improving the efficiency of *_dot_product for trinary-and-redundant matrices Modeling	38	1174	May 3, 2023
Should sparse matrices in Stan be row major or column major Developers	47	4502	August 21, 2017
Parallelising matrix multiplications Modeling	10	541	May 4, 2023

Conditions for SoA memory representation

Related topics