Best place to put big matrix during modeling

I would like to ask a question about performance in Stan. Let say I have a model that uses a big array of vectors partial_results of 2*N rows and M columns(M is a large number). The first N rows of partial_results are a transformation of the input data. Those first N rows do not depend on the model parameters. The last N rows of partial_results will depends on the first N rows and the current value of the model parameters.

My question: to make Stan run faster where should I declare this partial_resultsvariable: in block
transformed data or in block model?

if I put partial_resultsin transformed data block then I fill the first N rows only once. But then I cannot put partial_resultsin a left-side assignment in model( that would give a compilation error) since partial_results is not a model parameter.

On the other hand If I put partial_results in model block as a local variable, then I would need to fill the first N rows every time the model parameters change during HMC(M the number of columns of partial_results is a of order of thousands)

model{
  vector[10] partial_results[2*N,M]; 
....................................

}

Best solution would be to split it into two NxM matrices. If you do eg matrix vector products you can do it blockwise.

Ah, wait these are no matrices. Still, splitting them up will/should be best

1 Like

Thanks for the quick reply.

Yes, it is an array which according to Stan User’s Guide is slower than matrices.

A followed question: Can I still concatenate the 2 arrays in model block?
Or declared a array of 2*N rows in model block and then assign the first N rows to
the array declared in transformed data block?

Hm, I think in principle yes, you could, but it might be more efficient not to. In any case, the best way to find out if it impacts performance is to compare the two (or more) different implementations.

I don’t know Stan’s implementation details, so take what follows with a grain of salt or wait for someone else to confirm.

I think the data / transformed data variables are plain old double variables (in C), while parameters / transformed parameters / model variables should be some extra type that supports automatic differentiation, which is needed to compute the gradient of the log posterior density. As such, anything that you introduce in the latter blocks should introduce overhead.

Whether this is significant should depend on the problem.

1 Like