I am trying to reproduce the Stan results in this jupyter notebook (github link).
Background
This model is a PCA / factor-analysis model where we observe data matrix Y = XW^T + \epsilon and we want to recover the loadings matrix W. The loadings matrix is not rotationally identifiable (XW^T = X QQ^T X^T for orthogonal Q), so we should see a pattern of concentric circles in the posterior.
The authors ran this model on a synthetic data matrix Y, with n=100, d=4, and where the number of factors should be 2. Therefore the loadings matrix W should be 4x2. They specified how Y was generated, but did not post a seed, so I can only reproduce Y up to distribution.
The following figure shows their posterior of the matrix W, viewed as four vectors in R^2, for two models:
- The rotationally nonidentifiable PCA model, which corresponds to the concentric circles
- Their proposed modification of this model which removes the nonidentifiability. This corresponds to the dense blobs within each concentric ring. I’m not concerned with this part at the moment.
My question
Even using the authors’ code, I cannot reproduce the figure above. I can compile the model, but about 90% of the iterations hit the max tree depth of 10. (The authors’ code suggests no tree depth issues.) When I plot my posterior, I see the desired concentric circles:
However, when I break it out by each chain, I see that there are large angular sections that the chain did not reach:
Variations of this issue show up no matter what I do, e.g., change seeds, change Stan versions, change computing environments. What are possible reasons that I have this treedepth/mixing problem where the authors did not? What could I try to fix it?
Certainly the authors had a specific set of dependencies: C++ toolchain, eigen build, stan version, etc. Could it be due to this? (All I know is they used Pystan 2.17.)
Here is the model:
data{
int<lower=0> N;
int<lower=1> D;
int<lower=1> Q;
vector[D] Y[N];
}
parameters {
matrix[D, Q] W;
real<lower=0> sigma_noise;
vector[D] mu;
}
transformed parameters {
cholesky_factor_cov[D] L;
{
matrix[D, D] K = tcrossprod(W);
for (d in 1:D)
K[d,d] += square(sigma_noise) + 1e-14;
L = cholesky_decompose(K);
}
}
model{
mu ~ normal(0, 10);
sigma_noise ~ normal(0,1);
to_vector(W) ~ normal(0,1);
Y ~ multi_normal_cholesky(mu, L);
}