# Potential bug with (cmd)stan(py)'s optimisation process running a probabilistic PCA

I would like to understand why I’m seeing the results that I’m seeing running a PCA with Stan on some weirdly simulated data.

The data: is nine different mnist digits rotated a bunch of times.

The model:

\mathbf{Y}|\mathbf{X}, \sigma^2 \sim \mathcal{MN}(\mathbf{0}, \mathbf{X}\mathbf{X}^T + \sigma^2 \mathbf{I}, \mathbf{I})

(this is matrix normal notation for every column of \mathbf{Y} being independently drawn from a zero mean multivariate normal with a dor product covariance function applied to the latents).

The model famously corresponds to probabilistic PCA (the maximum likelihood solution for \mathbf{X} | \mathbf{Y} is attained at the scaled eigenvectors of the sample covariance).

The problem: If I use unconstrained latents, I get a different solution to if I use constrained latents. This is very surprising - is this caused by numerical instabilities? If so, why?

If I set the line (in my parameters block below) as follows:

    matrix<lower=-100, upper=100>[n, q] X;  // latents


I obtain the latents:

This are incorrect looking latents.

If instead the line is changed to:

    matrix[n, q] X;  // latents


the result becomes:

which is absolutely fine - it’s a rotation of the eigenvectors and since the posterior is invariant to rotation, this makes sense.

Stan places an improper uniform prior on the constrained parameters right? So I don’t understand this behaviour.

What’s more is that, using a bridgestan model that uses the constrained parameters, and using scipy’s optimize to optimize the parameters works and generates the correct solution (and not the weird axis aligned solution). Could someone shed some light into what’s going on?

I’m using cmdstan 2.31.0.

My code is as follows:


import torch
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from cmdstanpy import CmdStanModel
from torchvision.transforms.functional import rotate

plt.ion(); plt.style.use('seaborn-pastel')
np.random.seed(42)

with open('dr.stan', 'w') as f: f.write("""
data {
int n;  // num data
int d;  // num data dims
int q;  // num latents
matrix[n, d] Y;  // data
}
transformed data {
vector[d] w = rep_vector(1.0, d);
}
parameters {
matrix<lower=-100, upper=100>[n, q] X;  // latents
real<lower=1e-6, upper=1> sigma_sq;
}
model {
Y' ~ multi_gp(add_diag(X * X', sigma_sq), w);
}
""")

def plot(X):
plot_df = pd.DataFrame(dict(x=X[:, 0], y=X[:, 1], hue=c.astype(str)))
plot_df = plot_df.set_index('hue').sort_index().reset_index()
sns.scatterplot(data=plot_df, x='x', y='y', hue='hue', palette='Paired')

if __name__ == '__main__':

(_, _), (Y, c) = load_data()

c = np.arange(10)
Y = np.concatenate([Y[[np.where(c == lb)[0][0]], ...] for lb in c], axis=0)
Ys = []
for Yi in Y:
for angle in np.linspace(0, 350, 25):
Ys.append(rotate(torch.tensor(Yi)[None, ...], angle))
Y = torch.cat(Ys, axis=0).numpy().reshape(-1, 28**2)/255
c = np.repeat(c, 25)

Y += np.random.normal(0, 0.05, size=Y.shape)
Y -= Y.mean(axis=0)
Y -= Y.mean(axis=1)[..., None]

(n, d), q = Y.shape, 2

pd.Series(dict(
Y=Y, n=n, d=d, q=2,
)).to_json('data.json')

model = CmdStanModel(stan_file='dr.stan')
fit = model.optimize(data='data.json', show_console=True, iter=int(1e5), seed=42)

X = fit.stan_variable('X')
plot(X)