Hello Stan community!

This week I’ve been supporting a researcher with his implementation in Stan of a Bayesian model in which a PCA is included.

Regarding the Bayesian PCA, we are assuming that

where \mathbf{W}\in\mathbb{R}^{P \times Q} with Q<P, \mathbf{z}_{n} \sim \mathcal{N}_{Q}(\mathbf{0},\mathbf{I}), and \mathbf{\epsilon}_{n} \sim \mathcal{N}_{P}(\mathbf{0},\sigma^{2}\mathbf{I}) for n=1,2,\ldots,N.

Furthermore, proposing some priors for the latent vectors and hyperparameters, the Bayesian hierarchical model is as follows:

The ultimate goal is to implement within-chain parallelization aiming at speeding up the Stan code via the function *reduce_sum*, however, it turned out that computing the log-likelihood of \lbrace\mathbf{y}_{n}\rbrace_{n=1}^{N} with *reduce_sum* is slower than without it.

```
functions {
real partial_sum_lpdf(array[,] real y_slice,
int start, int end,
array[,] real mu,
real sigma) {
return normal_lupdf(to_array_1d(y_slice) | to_array_1d(mu[start:end,:]),sigma);
}
}
data {
int<lower=0> N;
int<lower=0> P;
int<lower=0> Q;
matrix[N,P] Y;
}
parameters {
matrix[P, Q] W;
real<lower=1e-6> sigma;
matrix[N, Q] Z;
cholesky_factor_corr[Q] L_Omega;
vector<lower=1e-6>[Q] tau;
}
transformed parameters {
matrix[N, P] mu = (Z * W');
}
model {
// Log-likelihood alternatives
for (n in 1:N) {
Y[n] ~ normal(mu[n], sigma);
}
to_vector(Y) ~ normal(to_vector(mu),sigma);
int grainsize = 1;
target += reduce_sum(partial_sum_lpdf,to_array_2d(Y),grainsize,to_array_2d(mu),sigma);
}
```

Above are the relevant pieces of the Stan code. As you can see, I’m trying three alternatives for computing the log-likelihood: the first uses a *for* loop, the second a vectorized version of the fist one, and the third deploys *reduce_sum*. This last one is really slow compare with the other ones. Could someone please help me out by detecting issues or suggesting ways of improving the implementation of *reduce_sum*?

Regards, Román.