Seemingly Unrelated Regressions with Low Rank + Diagonal Residual Covariance

I’m working on a model which closely mimics the SUR model here: Stan User’s Guide

The main difference is that I’m working in “high” dimensions, where I’m assuming on the order of p=100 features in y.

In short, I follow the SUR model in the documentation except tha \Sigma = BB^T + \Lambda where \Lambda is diagonal and B is of rank much less than p. When I tried running this with about p=100, it took overnight to generate 4k samples across 4 chains (in parallel), but the Rhats were all huge (low ESS, high autocorrelation). There is a rotational nonidentifiability in B which I don’t care about, but even the regression coefficients weren’t mixing. I do the matrix inversion analytically to leverage the low rank + diagonal structure but I’m not sure how much that helps.

I recognize this might be a challenging problem, but thought somebody might have advice on improving efficiency and/or reparameterizing to improve mixing. In general, I’ve found factor models of this sort to be somewhat painful in Stan.

Here’s the model as I’ve coded it up. Any thoughts on improving upon this would be helpful.

data {
  int<lower=1> K;
  int<lower=1> J;
  int<lower=0> N;
  int<lower=1> M; 
  vector[J] x[N];
  vector[K] y[N];
parameters {

  // intercept
  vector[K] alpha;
  //regression coefficients
  matrix[K, J] beta;
  // factor loadings
  matrix[K, M] B; 
  // uniquenesses
  vector<lower=0>[K] lambda;
transformed parameters {
  // invert matrix analytically to improve efficiency?
  matrix[K, K] lambda_inv = diag_matrix(1.0 ./ lambda);
  cov_matrix[K] Sigma_inv = lambda_inv - lambda_inv * B * inverse_spd(diag_matrix(rep_vector(1, M)) +  B' * lambda_inv * B) * B' * lambda_inv;
model {
  vector[K] mu[N];
  for(k in 1:K) {
    for(m in 1:M) {
      B[k, m] ~ normal(0, 1);
    for(j in 1:J) {
      beta[k, j] ~ normal(0, 1);
  alpha ~ normal(0, 10);

  for (n in 1:N)
    mu[n] = alpha + beta * x[n];
  y ~ multi_normal_prec(mu, Sigma_inv);

Unfortunately, non-identifiability in one part of the model can easily make sampling difficult for all parameters, so I think the prime target for improvement would to figure out a parametriyation of \Sigma or B that does not have this non-identifiability…

Unfortunately, I don’t remember seeing anyone being succesful with factor models in Stan - reliable uncertainty quantification in such models just seems extremely challenging. I’ve never worked with those models myself, so I might have easily missed some developments, but my impression is that beyond some relatively “trivial” problems, like rotational non-identifiabilities, it is also possible to have datasets that result in multiple qualitatively different modes that all should contribute non-negligible mass to the posterior (e.g. where different number of factors have non-negligible weights). It is a hard problem :-/

Best of luck with your model!