If you have redundant rows in x
within a given group, you can use the sufficient statistics trick to speed things up. Separately, if there is redundancy across groups in the rows of x
I think you’d be able to use the reduced-redundant-computations tricks too. Finally, if you have many more cores available than chains you want to sample, take a look at the new reduce_sum feature for within-chain parallelism.
1 Like