Reduce_sum in a Bayesian PCA

For this particular model, I think you will get the most speedup by marginalizing out z and writing the resulting multivariate normal as a function of the sample mean and covariance matrix.

If you marginalize over z you get

\mathbf{y}_n \sim \mathcal{N}_{P}(\mathbf{0}, \mathbf{WW}' + \sigma^{2}\mathbf{I})

Then the \mathbf{y}_n are iid and you can evaluate the likelihood using sufficient statistics (sample mean vector and covariance matrix). This means you don’t need to loop over n, you can compute the sample mean vector and covariance matrix once in the generated data block (or they can be sent in as data). The Stan code to evaluate the multivariate normal via sample mean and covariance matrix is at this post.

If you want the \mathbf{z}_n, you can obtain them in generated quantities by using the multivariate normal rng. You have that (\mathbf{y}_n'\ \mathbf{z}_n')' is multivariate normal and, from there, can obtain the distribution of \mathbf{z}_n \mid\ \mathbf{y}_n as another multivariate normal. I am working from memory, but I believe the result is

\mathbf{z}_n \mid\ \mathbf{y}_n \sim \mathcal{N}_{Q}(\mathbf{W}'(\mathbf{WW}' + \sigma^{2}\mathbf{I})^{-1}\mathbf{y}_n,\ \mathbf{I} - \mathbf{W}'(\mathbf{WW}' + \sigma^{2}\mathbf{I})^{-1}\mathbf{W})

For what it’s worth, I would call this a factor analysis instead of a PCA because it involves a multivariate normal model, whereas PCA is typically a matrix algorithm without a likelihood.

2 Likes