Help with multi-threading a linear regression model and slicing the design matrix

Thanks again for all your help @rok_cesnovar - so where is the speed-up actually coming from then? The example I linked heavily implies it’s coming from slicing up n_redcards (I can’t see any other differences between the code that I would have expected to speed it up, but you’re far more familiar with Stan than I am). Do you mean the actual slicing of the data itself doesn’t give you a big speed-up, but instead it’s the division of the normal_lpdf calls across the threads providing the speed-boost? Or something else entirely?

Re the point about data not being copied in reduce_sum for each thread, I’d formed that impression from this comment and this one - have I misunderstood/missed something or are those comments not quite right?

The model works really, really well, and even better when I parallelise further! Is there a multivariate-normal equivalent of this (for a slightly more complex model I need to develop)?

Thanks again for all your help so far! Hugely appreciate it! :)