Reduce_sum in a Bayesian PCA

Hello Stan community!

This week I’ve been supporting a researcher with his implementation in Stan of a Bayesian model in which a PCA is included.

Regarding the Bayesian PCA, we are assuming that

\mathbf{y}_{n}=\mathbf{W}\mathbf{z}_{n}+\mathbf{\epsilon}_{n},

where \mathbf{W}\in\mathbb{R}^{P \times Q} with Q<P, \mathbf{z}_{n} \sim \mathcal{N}_{Q}(\mathbf{0},\mathbf{I}), and \mathbf{\epsilon}_{n} \sim \mathcal{N}_{P}(\mathbf{0},\sigma^{2}\mathbf{I}) for n=1,2,\ldots,N.

Furthermore, proposing some priors for the latent vectors and hyperparameters, the Bayesian hierarchical model is as follows:

\mathbf{y}_{n} \sim \mathcal{N}_{P}(\mathbf{W}\mathbf{z}_{n},\sigma^{2}\mathbf{I})\\ \mathbf{w}_{p}\mid\tau,\mathbf{\Omega} \sim \mathcal{N}_{P}(\mathbf{0},\tau\mathbf{\Omega})\quad p=1,2,\ldots,P\\ \mathbf{z}_{n} \sim \mathcal{N}_{Q}(\mathbf{0},\mathbf{I})\\ \sigma \sim \mathcal{N}(0,100^{2})\\ \tau \sim \mathrm{Cauchy}(0, 2.5)\\ \mathbf{\Omega} \sim \mathrm{LKJ}(2)

The ultimate goal is to implement within-chain parallelization aiming at speeding up the Stan code via the function reduce_sum, however, it turned out that computing the log-likelihood of \lbrace\mathbf{y}_{n}\rbrace_{n=1}^{N} with reduce_sum is slower than without it.

functions {
  real partial_sum_lpdf(array[,] real y_slice,
                      int start, int end,
                      array[,] real mu,
                      real sigma) {
                     return normal_lupdf(to_array_1d(y_slice) | to_array_1d(mu[start:end,:]),sigma);
                   }
}

data {
  int<lower=0> N;         
  int<lower=0> P;         
  int<lower=0> Q;   
  matrix[N,P] Y;
}

parameters {
  matrix[P, Q] W;
  real<lower=1e-6> sigma;     
  matrix[N, Q] Z;     
  cholesky_factor_corr[Q] L_Omega;
  vector<lower=1e-6>[Q] tau;
}

transformed parameters { 
  matrix[N, P] mu = (Z * W');
}

model {
  // Log-likelihood alternatives
  for (n in 1:N) {
    Y[n] ~ normal(mu[n], sigma);
  }
  to_vector(Y) ~ normal(to_vector(mu),sigma);
  int grainsize = 1;
  target += reduce_sum(partial_sum_lpdf,to_array_2d(Y),grainsize,to_array_2d(mu),sigma);
}

Above are the relevant pieces of the Stan code. As you can see, I’m trying three alternatives for computing the log-likelihood: the first uses a for loop, the second a vectorized version of the fist one, and the third deploys reduce_sum. This last one is really slow compare with the other ones. Could someone please help me out by detecting issues or suggesting ways of improving the implementation of reduce_sum?

Regards, Román.

With grainsize = 1 you probably have quite high overhead, which would be reduce with bigger grainsizes.

Why don’t you define Wt as matrix[Q, P] to avoid repeated transpose?

You seem to have the normal model twice, and you should drop the for loop lines.

If N and P are big, see also the blog post for speed optimizations

2 Likes

For this particular model, I think you will get the most speedup by marginalizing out z and writing the resulting multivariate normal as a function of the sample mean and covariance matrix.

If you marginalize over z you get

\mathbf{y}_n \sim \mathcal{N}_{P}(\mathbf{0}, \mathbf{WW}' + \sigma^{2}\mathbf{I})

Then the \mathbf{y}_n are iid and you can evaluate the likelihood using sufficient statistics (sample mean vector and covariance matrix). This means you don’t need to loop over n, you can compute the sample mean vector and covariance matrix once in the generated data block (or they can be sent in as data). The Stan code to evaluate the multivariate normal via sample mean and covariance matrix is at this post.

If you want the \mathbf{z}_n, you can obtain them in generated quantities by using the multivariate normal rng. You have that (\mathbf{y}_n'\ \mathbf{z}_n')' is multivariate normal and, from there, can obtain the distribution of \mathbf{z}_n \mid\ \mathbf{y}_n as another multivariate normal. I am working from memory, but I believe the result is

\mathbf{z}_n \mid\ \mathbf{y}_n \sim \mathcal{N}_{Q}(\mathbf{W}'(\mathbf{WW}' + \sigma^{2}\mathbf{I})^{-1}\mathbf{y}_n,\ \mathbf{I} - \mathbf{W}'(\mathbf{WW}' + \sigma^{2}\mathbf{I})^{-1}\mathbf{W})

For what it’s worth, I would call this a factor analysis instead of a PCA because it involves a multivariate normal model, whereas PCA is typically a matrix algorithm without a likelihood.

2 Likes

You were right… It’s better to test different values of this parameter

I did it and it also helps

I should mention that I was commenting out two of the three alternatives when compiling the Stan code

Thanks a lot, I definitely will read the post thoroughly

You are right, if I marginalize out \mathbf{z}_{n} I get the marginal distribution you wrote. Nevertheless, if I’m not wrong, Stan needs the conditional distribution of \mathbf{y}_{n} not the marginal distribution of \mathbf{y}_{n}.

The conditional distribution \mathbf{z}_{n} given \mathbf{y}_{n} assumes that \mathbf{W} is given. Within the Bayesian approach, \mathbf{W} is also a matrix of random variables.

1 Like

Thanks a lot @avehtari and @edm for your comments and suggestions - they were really useful for better understanding my model.

I’ve have implemented a Bayesian PCA from scratch to test reduce_sum, which allowed me to progress on finding a more efficient way to implement within-chain parallelization in the model. I’ll post it today.

1 Like

I would be wiling to bet a beer that you will obtain the same posterior distribution regardless of whether you use marginal or conditional likelihood.

About \mathbf{W} being given: in generated quantities, \mathbf{W} has already been sampled so is basically given (the “given” just changes at each iteration). Sampling \mathbf{z} in generated quantities is like a Gibbs sampling step.

1 Like

This a good exercise… Even better if there is a beer in play :)
I’ll try this out once I finish the implementation of the within-chain parallelization via reduce_sum

I see your point… I consider it’s worth trying this approach. As I mentioned above, I’ll try this out.