Seemingly Unrelated Regressions with Low Rank + Diagonal Residual Covariance

afranks · November 18, 2021, 5:37pm

I’m working on a model which closely mimics the SUR model here: Stan User’s Guide

The main difference is that I’m working in “high” dimensions, where I’m assuming on the order of p=100 features in y.

In short, I follow the SUR model in the documentation except tha \Sigma = BB^T + \Lambda where \Lambda is diagonal and B is of rank much less than p. When I tried running this with about p=100, it took overnight to generate 4k samples across 4 chains (in parallel), but the Rhats were all huge (low ESS, high autocorrelation). There is a rotational nonidentifiability in B which I don’t care about, but even the regression coefficients weren’t mixing. I do the matrix inversion analytically to leverage the low rank + diagonal structure but I’m not sure how much that helps.

I recognize this might be a challenging problem, but thought somebody might have advice on improving efficiency and/or reparameterizing to improve mixing. In general, I’ve found factor models of this sort to be somewhat painful in Stan.

Here’s the model as I’ve coded it up. Any thoughts on improving upon this would be helpful.

data {
  int<lower=1> K;
  int<lower=1> J;
  int<lower=0> N;
  int<lower=1> M; 
  vector[J] x[N];
  vector[K] y[N];
  
}
parameters {

  // intercept
  vector[K] alpha;
  //regression coefficients
  matrix[K, J] beta;
  
  // factor loadings
  matrix[K, M] B; 
  // uniquenesses
  vector<lower=0>[K] lambda;
}
transformed parameters {
  // invert matrix analytically to improve efficiency?
  matrix[K, K] lambda_inv = diag_matrix(1.0 ./ lambda);
  cov_matrix[K] Sigma_inv = lambda_inv - lambda_inv * B * inverse_spd(diag_matrix(rep_vector(1, M)) +  B' * lambda_inv * B) * B' * lambda_inv;
}
model {
  vector[K] mu[N];
  
  for(k in 1:K) {
    for(m in 1:M) {
      B[k, m] ~ normal(0, 1);
    }
    
    for(j in 1:J) {
      beta[k, j] ~ normal(0, 1);
    }
  }
  
  alpha ~ normal(0, 10);

  
  for (n in 1:N)
    mu[n] = alpha + beta * x[n];
  y ~ multi_normal_prec(mu, Sigma_inv);
}

martinmodrak · November 22, 2021, 8:56pm

Unfortunately, non-identifiability in one part of the model can easily make sampling difficult for all parameters, so I think the prime target for improvement would to figure out a parametriyation of \Sigma or B that does not have this non-identifiability…

Unfortunately, I don’t remember seeing anyone being succesful with factor models in Stan - reliable uncertainty quantification in such models just seems extremely challenging. I’ve never worked with those models myself, so I might have easily missed some developments, but my impression is that beyond some relatively “trivial” problems, like rotational non-identifiabilities, it is also possible to have datasets that result in multiple qualitatively different modes that all should contribute non-negligible mass to the posterior (e.g. where different number of factors have non-negligible weights). It is a hard problem :-/

Best of luck with your model!

Topic		Replies	Views
Reduced rank regression Modeling	20	3250	January 28, 2023
Possibly improving the performance of Mulivariantnormal Distribution Modeling	2	476	February 28, 2020
Multivariate normal too slow Modeling performance	7	4201	December 2, 2017
Reduced rank regression similar to incomplete principal components regression Modeling specification	5	910	August 8, 2018
[Case-study preview] Speeding up Stan by reducing redundant computation Publicity performance	8	2124	June 6, 2020

Seemingly Unrelated Regressions with Low Rank + Diagonal Residual Covariance

Related topics