Reduce_sum in a Bayesian PCA

kokorap27 · November 17, 2023, 11:04pm

Hello Stan community!

This week I’ve been supporting a researcher with his implementation in Stan of a Bayesian model in which a PCA is included.

Regarding the Bayesian PCA, we are assuming that

\mathbf{y}_{n}=\mathbf{W}\mathbf{z}_{n}+\mathbf{\epsilon}_{n},

where \mathbf{W}\in\mathbb{R}^{P \times Q} with Q<P, \mathbf{z}_{n} \sim \mathcal{N}_{Q}(\mathbf{0},\mathbf{I}), and \mathbf{\epsilon}_{n} \sim \mathcal{N}_{P}(\mathbf{0},\sigma^{2}\mathbf{I}) for n=1,2,\ldots,N.

Furthermore, proposing some priors for the latent vectors and hyperparameters, the Bayesian hierarchical model is as follows:

\mathbf{y}_{n} \sim \mathcal{N}_{P}(\mathbf{W}\mathbf{z}_{n},\sigma^{2}\mathbf{I})\\ \mathbf{w}_{p}\mid\tau,\mathbf{\Omega} \sim \mathcal{N}_{P}(\mathbf{0},\tau\mathbf{\Omega})\quad p=1,2,\ldots,P\\ \mathbf{z}_{n} \sim \mathcal{N}_{Q}(\mathbf{0},\mathbf{I})\\ \sigma \sim \mathcal{N}(0,100^{2})\\ \tau \sim \mathrm{Cauchy}(0, 2.5)\\ \mathbf{\Omega} \sim \mathrm{LKJ}(2)

The ultimate goal is to implement within-chain parallelization aiming at speeding up the Stan code via the function reduce_sum, however, it turned out that computing the log-likelihood of \lbrace\mathbf{y}_{n}\rbrace_{n=1}^{N} with reduce_sum is slower than without it.

functions {
  real partial_sum_lpdf(array[,] real y_slice,
                      int start, int end,
                      array[,] real mu,
                      real sigma) {
                     return normal_lupdf(to_array_1d(y_slice) | to_array_1d(mu[start:end,:]),sigma);
                   }
}

data {
  int<lower=0> N;         
  int<lower=0> P;         
  int<lower=0> Q;   
  matrix[N,P] Y;
}

parameters {
  matrix[P, Q] W;
  real<lower=1e-6> sigma;     
  matrix[N, Q] Z;     
  cholesky_factor_corr[Q] L_Omega;
  vector<lower=1e-6>[Q] tau;
}

transformed parameters { 
  matrix[N, P] mu = (Z * W');
}

model {
  // Log-likelihood alternatives
  for (n in 1:N) {
    Y[n] ~ normal(mu[n], sigma);
  }
  to_vector(Y) ~ normal(to_vector(mu),sigma);
  int grainsize = 1;
  target += reduce_sum(partial_sum_lpdf,to_array_2d(Y),grainsize,to_array_2d(mu),sigma);
}

Above are the relevant pieces of the Stan code. As you can see, I’m trying three alternatives for computing the log-likelihood: the first uses a for loop, the second a vectorized version of the fist one, and the third deploys reduce_sum. This last one is really slow compare with the other ones. Could someone please help me out by detecting issues or suggesting ways of improving the implementation of reduce_sum?

Regards, Román.

avehtari · November 18, 2023, 2:42pm

With grainsize = 1 you probably have quite high overhead, which would be reduce with bigger grainsizes.

Why don’t you define Wt as matrix[Q, P] to avoid repeated transpose?

You seem to have the normal model twice, and you should drop the for loop lines.

If N and P are big, see also the blog post for speed optimizations

edm · November 19, 2023, 4:24pm

For this particular model, I think you will get the most speedup by marginalizing out z and writing the resulting multivariate normal as a function of the sample mean and covariance matrix.

If you marginalize over z you get

\mathbf{y}_n \sim \mathcal{N}_{P}(\mathbf{0}, \mathbf{WW}' + \sigma^{2}\mathbf{I})

Then the \mathbf{y}_n are iid and you can evaluate the likelihood using sufficient statistics (sample mean vector and covariance matrix). This means you don’t need to loop over n, you can compute the sample mean vector and covariance matrix once in the generated data block (or they can be sent in as data). The Stan code to evaluate the multivariate normal via sample mean and covariance matrix is at this post.

If you want the \mathbf{z}_n, you can obtain them in generated quantities by using the multivariate normal rng. You have that (\mathbf{y}_n'\ \mathbf{z}_n')' is multivariate normal and, from there, can obtain the distribution of \mathbf{z}_n \mid\ \mathbf{y}_n as another multivariate normal. I am working from memory, but I believe the result is

\mathbf{z}_n \mid\ \mathbf{y}_n \sim \mathcal{N}_{Q}(\mathbf{W}'(\mathbf{WW}' + \sigma^{2}\mathbf{I})^{-1}\mathbf{y}_n,\ \mathbf{I} - \mathbf{W}'(\mathbf{WW}' + \sigma^{2}\mathbf{I})^{-1}\mathbf{W})

For what it’s worth, I would call this a factor analysis instead of a PCA because it involves a multivariate normal model, whereas PCA is typically a matrix algorithm without a likelihood.

kokorap27 · December 1, 2023, 3:28pm

You were right… It’s better to test different values of this parameter

I did it and it also helps

I should mention that I was commenting out two of the three alternatives when compiling the Stan code

Thanks a lot, I definitely will read the post thoroughly

kokorap27 · December 1, 2023, 3:42pm

You are right, if I marginalize out \mathbf{z}_{n} I get the marginal distribution you wrote. Nevertheless, if I’m not wrong, Stan needs the conditional distribution of \mathbf{y}_{n} not the marginal distribution of \mathbf{y}_{n}.

The conditional distribution \mathbf{z}_{n} given \mathbf{y}_{n} assumes that \mathbf{W} is given. Within the Bayesian approach, \mathbf{W} is also a matrix of random variables.

kokorap27 · December 1, 2023, 3:48pm

Thanks a lot @avehtari and @edm for your comments and suggestions - they were really useful for better understanding my model.

I’ve have implemented a Bayesian PCA from scratch to test reduce_sum, which allowed me to progress on finding a more efficient way to implement within-chain parallelization in the model. I’ll post it today.

edm · December 1, 2023, 3:57pm

I would be wiling to bet a beer that you will obtain the same posterior distribution regardless of whether you use marginal or conditional likelihood.

About \mathbf{W} being given: in generated quantities, \mathbf{W} has already been sampled so is basically given (the “given” just changes at each iteration). Sampling \mathbf{z} in generated quantities is like a Gibbs sampling step.

kokorap27 · December 1, 2023, 4:23pm

This a good exercise… Even better if there is a beer in play :)
I’ll try this out once I finish the implementation of the within-chain parallelization via reduce_sum

I see your point… I consider it’s worth trying this approach. As I mentioned above, I’ll try this out.

Topic		Replies	Views
Issues with arrays, matrices, and reduce_sum for Bayesian PCA General	6	449	December 19, 2023
Reduce_sum results in much slower run times, even for large datasets Algorithms paralellization	6	1421	March 17, 2022
Parallelization in Stan General	6	506	October 24, 2020
Reduce_sum parallelisation issue Modeling cmdstanr , multivariate-normal	12	1040	February 24, 2022
Reduce_sum for factor analysis models Modeling	1	730	February 14, 2022

Reduce_sum in a Bayesian PCA

Related topics