I’m working on a model which closely mimics the SUR model here: Stan User’s Guide
The main difference is that I’m working in “high” dimensions, where I’m assuming on the order of p=100 features in y.
In short, I follow the SUR model in the documentation except tha \Sigma = BB^T + \Lambda where \Lambda is diagonal and B is of rank much less than p. When I tried running this with about p=100, it took overnight to generate 4k samples across 4 chains (in parallel), but the Rhats were all huge (low ESS, high autocorrelation). There is a rotational nonidentifiability in B which I don’t care about, but even the regression coefficients weren’t mixing. I do the matrix inversion analytically to leverage the low rank + diagonal structure but I’m not sure how much that helps.
I recognize this might be a challenging problem, but thought somebody might have advice on improving efficiency and/or reparameterizing to improve mixing. In general, I’ve found factor models of this sort to be somewhat painful in Stan.
Here’s the model as I’ve coded it up. Any thoughts on improving upon this would be helpful.
data {
int<lower=1> K;
int<lower=1> J;
int<lower=0> N;
int<lower=1> M;
vector[J] x[N];
vector[K] y[N];
}
parameters {
// intercept
vector[K] alpha;
//regression coefficients
matrix[K, J] beta;
// factor loadings
matrix[K, M] B;
// uniquenesses
vector<lower=0>[K] lambda;
}
transformed parameters {
// invert matrix analytically to improve efficiency?
matrix[K, K] lambda_inv = diag_matrix(1.0 ./ lambda);
cov_matrix[K] Sigma_inv = lambda_inv - lambda_inv * B * inverse_spd(diag_matrix(rep_vector(1, M)) + B' * lambda_inv * B) * B' * lambda_inv;
}
model {
vector[K] mu[N];
for(k in 1:K) {
for(m in 1:M) {
B[k, m] ~ normal(0, 1);
}
for(j in 1:J) {
beta[k, j] ~ normal(0, 1);
}
}
alpha ~ normal(0, 10);
for (n in 1:N)
mu[n] = alpha + beta * x[n];
y ~ multi_normal_prec(mu, Sigma_inv);
}