Trouble with prior selection for multivariate normal-inverse wishart analysis

I am trying to conduct bayesian inference in a multidimensional setting. I’m using a multivariate normal prior for my prior mean and an inverse wishart distribution for my prior covariance matrix. When I test my code in a low-dimensional setting (10-30), it works fine. But in higher dimensional settings (50+) it fails because of what appears to be a poorly specified prior.

I’m using ScalaStan and here is my code:

val sampleSize = 1000
val assetCount: Int = 500
val dataset = List.fill(sampleSize)(0.0)
  .map(x => List.fill(assetCount)(x + math.min(math.max(scala.util.Random.nextGaussian(), -3), 3)))
OutputCell.HIDDEN
val muPriorMu: List[Double] = List.fill(assetCount)(0)
val muPriorSigma: List[List[Double]] = DenseMatrix.eye[Double](assetCount)
  .toArray
  .toList
  .map { case 0 => 0.25; case x => x }
  .grouped(assetCount)
  .map(x => x.toList)
  .toList
val sigmaPriorNu: Double = assetCount + 1
val sigmaPriorSigma: List[List[Double]] = muPriorSigma

class multiVarNormalClass(
  dataset: List[List[Double]], 
  muPriorMu: List[Double],
  muPriorSigma: List[List[Double]],
  sigmaPriorNu: Double,
  sigmaPriorSigma: List[List[Double]],
  sampleSize: Int,
  assetCount: Int
) extends ScalaStan {

  val n = data(int(lower = 0))
  val y = data(vector(assetCount)(n))
  val muPrior_mu = data(vector(assetCount))
  val muPrior_sigma = data(covMatrix(assetCount))
  val sigmaPrior_nu = data(real(lower = assetCount - 1.0))
  val sigmaPrior_sigma = data(covMatrix(assetCount))

  val mu = parameter(vector(assetCount))
  val sigma = parameter(covMatrix(assetCount))

  val model = new Model {

    mu ~ stan.multi_normal(muPrior_mu, muPrior_sigma)
    sigma ~ stan.inv_wishart(sigmaPrior_nu, sigmaPrior_sigma)
    for (i <- range(1, n)) {
      y(i) ~ stan.multi_normal(mu, sigma)
    }
  }

  val results = model
    .withData(n, sampleSize)
    .withData(y, dataset)
    .withData(muPrior_mu, muPriorMu)
    .withData(muPrior_sigma, muPriorSigma)
    .withData(sigmaPrior_nu, sigmaPriorNu)
    .withData(sigmaPrior_sigma, sigmaPriorSigma)
    .run(chains = 5, method = RunMethod.Sample())
  
  val muPosteriorMean = results
    .mean(mu)
    .toArray
  
  val sigmaPosteriorMean = results
    .mean(sigma)
    .map(_.toArray)
    .toArray
  
  def mahDistance(residualVector: Array[Double]): Double = {
    val mahDist = new MahalanobisDistance(
      sigmaPosteriorMean
    )
    mahDist
      .d(residualVector, muPosteriorMean)
  }
  
  def getMu: Array[Double] = muPosteriorMean
  
  def getSigma: Array[Array[Double]] = sigmaPosteriorMean
  
}

val myClass = new multiVarNormalClass(
  dataset, 
  muPriorMu, 
  muPriorSigma, 
  sigmaPriorNu, 
  sigmaPriorSigma,
  sampleSize,
  assetCount
)

This code generates the following Stan program:

data {                                                                                            
  vector[500] muPrior_mu;                                                                          
  cov_matrix[500] muPrior_sigma;                                                                   
  real<lower=499.0> sigmaPrior_nu;                                                                  
  cov_matrix[500] sigmaPrior_sigma;                                                                
  int<lower=0> n;                                                                                 
  vector[500] y[n];                                                                                
}                                                                                                 
parameters {                                                                                      
  vector[500] mu;                                                                                  
  cov_matrix[500] sigma;                                                                           
}                                                                                                 
model {                                                                                           
  mu ~ multi_normal(muPrior_mu,muPrior_sigma);                                                    
  sigma ~ inv_wishart(sigmaPrior_nu,sigmaPrior_sigma);                                            
  for(ss_v1 in 1:n) {                                                                             
    y[ss_v1] ~ multi_normal(mu,sigma);                                                            
  }                                                                                               
}   

I declare the dimensionality of the space with the variable “assetCount”, so in the code I’ve pasted above the dimensionality is 500. I’m creating four priors:

muPriorMu: assetCount x 1 vector, all 0s
muPriorSigma: assetCount x assetCount covariance matrix, 1s on the diagonal and 0.25s off the diagonal
sigmaPriorNu: double, assetCount x 2
sigmaPriorSigma: same as muPriorSigma

The dataset is simulated draws from independent univariate gaussians capped at +/-3, but my interpretation of my error is that the dataset isn’t relevant here.

When I set assetCount to say 10, the program runs fine. At around 30, it runs but slowly. At say 500, I get the following error (this is the tail end of the 100 attempts to initialize):

2018-10-18 19:14:21:700 +0000 [ForkJoinPool-1-worker-11] INFO  CommandRunner$ - Exception: inv_wishart_lpdf: LDLT_Factor of random variable is not positive definite.  last conditional variance is 1.11022e-15.  (in '/home/beakerx/.scalastan/7e4871a2f86b6a32dd8e19055afced54fb565d44-1/model.stan' at line 15)
2018-10-18 19:14:21:700 +0000 [ForkJoinPool-1-worker-11] INFO  CommandRunner$ - 
2018-10-18 19:14:21:787 +0000 [ForkJoinPool-1-worker-11] INFO  CommandRunner$ - Rejecting initial value:
2018-10-18 19:14:21:787 +0000 [ForkJoinPool-1-worker-11] INFO  CommandRunner$ -   Error evaluating the log probability at the initial value.
2018-10-18 19:14:21:787 +0000 [ForkJoinPool-1-worker-11] INFO  CommandRunner$ - Exception: inv_wishart_lpdf: LDLT_Factor of random variable is not positive definite.  last conditional variance is 1.55431e-15.  (in '/home/beakerx/.scalastan/7e4871a2f86b6a32dd8e19055afced54fb565d44-1/model.stan' at line 15)
2018-10-18 19:14:21:787 +0000 [ForkJoinPool-1-worker-11] INFO  CommandRunner$ - 
2018-10-18 19:14:21:787 +0000 [ForkJoinPool-1-worker-11] INFO  CommandRunner$ - 
2018-10-18 19:14:21:787 +0000 [ForkJoinPool-1-worker-11] INFO  CommandRunner$ - Initialization between (-2, 2) failed after 100 attempts. 
2018-10-18 19:14:21:787 +0000 [ForkJoinPool-1-worker-11] INFO  CommandRunner$ -  Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
2018-10-18 19:14:21:788 +0000 [ForkJoinPool-1-worker-11] INFO  CommandRunner$ - Initialization failed.
2018-10-18 19:14:21:791 +0000 [ForkJoinPool-1-worker-11] ERROR CmdStanRunner - model returned 70

This appears to be a problem with the inverse wishart prior. My first attempted remediation was to go from a pure identity matrix to a matrix with off diagonal values of 0.25, but this doesn’t fix anything. My next was to constrain the dataset to be only between +/-3, but I don’t think the dataset has anything to do with the problem. Any pointers would be much appreciated!

For size 500, it is very difficult to get a covariance matrix that isn’t numerically indefinite.

If you must use the inverse Wishart distribution, I would integrate out the covariance matrix as in

But the more general approach people tend to use in Stan is to declare a Cholesky factor of a covariance or correlation matrix in the parameters block of their Stan program and work with that (which does not require any expensive or inaccurate matrix decompositions). One thing that has worked particularly well over the years is Cholesky factors of correlation matrices that have LKJ priors along with a positive vector of standard deviations, which when multiplied together yields a Cholesky factor of a covariance matrix

1 Like

Thanks for the response! I have no special affinity for the inverse Wishart and don’t care about the parametrizability of my posterior distribution: all I want out of a covariance matrix prior is something that works. It sounds like given that, I should follow the Cholesky factor approach you’re suggesting. Would you mind pointing me to some examples?

Section 6.13 of

1 Like

Awesome, thanks for the pointer. I’ve implemented the cauchy-LKJ setup as suggested in the user guide. Sadly the same issue persists. Honestly, if this problem is just kind of intractable, that’s fine. I just want to make sure there isn’t some obvious optimization that I could be making (and it looks like switching to the cauchy-LKJ prior might have been the main one?).

You weren’t supposed to run into the same issue because there should be no inverse Wishart anymore. What is your new Stan program?

1 Like

Sure: the new Stan program looks like this:

data {                                                                          
  vector[500] muPrior_mu;                                                       
  int<lower=0> n;                                                               
  vector[500] y[n];                                                             
}                                                                               
parameters {                                                                    
  corr_matrix[500] omega;                                                       
  vector[500] tau;                                                              
  vector[500] mu;                                                               
}                                                                               
model {                                                                         
  omega ~ lkj_corr(50);                                                         
  tau ~ cauchy(0,2.5);                                                          
  mu ~ multi_normal(muPrior_mu,quad_form_diag(omega,tau));                      
  for(ss_v1 in 1:n) {                                                           
    y[ss_v1] ~ multi_normal(mu,quad_form_diag(omega,tau));                      
  }                                                                             
}                                                                               

I’m passing it 10,000 observations (but passing fewer observations doesn’t help). I’ve messed with the lkj_corr prior (as low as 2 and some number above 50) to no avail. Error is exactly the same as before.

Your code is still utilizing a correlation matrix, rather than its Cholesky factor. It should be

parameters {
  cholesky_factor_corr[500] omega;
  ...
model {
  omega ~ lkj_corr_cholesky(shape);
  ...

where shape is usually something between 1 and 2.

Thanks for the pointer. My Stan program is now this:

data {
  vector[500] muPrior_mu;
  int<lower=0> n;
  vector[500] y[n];
}
parameters {
  cholesky_factor_corr[500] omega;
  vector[500] tau;
  vector[500] mu;
}
model {
  omega ~ lkj_corr_cholesky(2);
  tau ~ cauchy(0,2.5);
  mu ~ multi_normal(muPrior_mu,quad_form_diag(omega,tau));
  for(ss_v1 in 1:n) {
    y[ss_v1] ~ multi_normal(mu,quad_form_diag(omega,tau));
  }
}

I now get errors that look like this:

2018-10-22 15:02:55:854 +0000 [ForkJoinPool-1-worker-13] INFO  CommandRunner$ - Exception: multi_normal_lpdf: Covariance matrix is not symmetric. Covariance matrix[1,2] = -0, but Covariance matrix[2,1] = 0.235101  (in '/home/beakerx/.scalastan/c33c0c5bb17b79070847e22871fe1598404b890b-1/model.stan' at line 14)
2018-10-22 15:02:55:854 +0000 [ForkJoinPool-1-worker-13] INFO  CommandRunner$ - 
2018-10-22 15:02:55:854 +0000 [ForkJoinPool-1-worker-13] INFO  CommandRunner$ - 
2018-10-22 15:02:55:854 +0000 [ForkJoinPool-1-worker-13] INFO  CommandRunner$ - Initialization between (-2, 2) failed after 100 attempts. 
2018-10-22 15:02:55:854 +0000 [ForkJoinPool-1-worker-13] INFO  CommandRunner$ -  Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
2018-10-22 15:02:55:859 +0000 [ForkJoinPool-1-worker-13] INFO  CommandRunner$ - Initialization failed.
2018-10-22 15:02:55:865 +0000 [ForkJoinPool-1-worker-13] ERROR CmdStanRunner - model returned 70
2018-10-22 15:02:55:910 +0000 [ForkJoinPool-1-worker-7] INFO  CommandRunner$ - Rejecting initial value:
2018-10-22 15:02:55:910 +0000 [ForkJoinPool-1-worker-7] INFO  CommandRunner$ -   Error evaluating the log probability at the initial value.
2018-10-22 15:02:55:910 +0000 [ForkJoinPool-1-worker-7] INFO  CommandRunner$ - Exception: multi_normal_lpdf: Covariance matrix is not symmetric. Covariance matrix[1,2] = -0, but Covariance matrix[2,1] = 0.0777174  (in '/home/beakerx/.scalastan/c33c0c5bb17b79070847e22871fe1598404b890b-1/model.stan' at line 14)
2018-10-22 15:02:55:910 +0000 [ForkJoinPool-1-worker-7] INFO  CommandRunner$ - 
2018-10-22 15:02:56:008 +0000 [ForkJoinPool-1-worker-7] INFO  CommandRunner$ - Rejecting initial value:
2018-10-22 15:02:56:008 +0000 [ForkJoinPool-1-worker-7] INFO  CommandRunner$ -   Error evaluating the log probability at the initial value.
2018-10-22 15:02:56:008 +0000 [ForkJoinPool-1-worker-7] INFO  CommandRunner$ - Exception: multi_normal_lpdf: Covariance matrix is not symmetric. Covariance matrix[1,2] = -0, but Covariance matrix[2,1] = 4.30817e-05  (in '/home/beakerx/.scalastan/c33c0c5bb17b79070847e22871fe1598404b890b-1/model.stan' at line 14)
2018-10-22 15:02:56:008 +0000 [ForkJoinPool-1-worker-7] INFO  CommandRunner$ - 
2018-10-22 15:02:56:105 +0000 [ForkJoinPool-1-worker-7] INFO  CommandRunner$ - Rejecting initial value:
2018-10-22 15:02:56:105 +0000 [ForkJoinPool-1-worker-7] INFO  CommandRunner$ -   Error evaluating the log probability at the initial value.
2018-10-22 15:02:56:105 +0000 [ForkJoinPool-1-worker-7] INFO  CommandRunner$ - Exception: multi_normal_lpdf: Covariance matrix is not symmetric. Covariance matrix[1,2] = 0, but Covariance matrix[2,1] = -0.0155525  (in '/home/beakerx/.scalastan/c33c0c5bb17b79070847e22871fe1598404b890b-1/model.stan' at line 14)
2018-10-22 15:02:56:105 +0000 [ForkJoinPool-1-worker-7] INFO  CommandRunner$ - 
2018-10-22 15:02:56:193 +0000 [ForkJoinPool-1-worker-7] INFO  CommandRunner$ - Rejecting initial value:
2018-10-22 15:02:56:193 +0000 [ForkJoinPool-1-worker-7] INFO  CommandRunner$ -   Error evaluating the log probability at the initial value.
2018-10-22 15:02:56:193 +0000 [ForkJoinPool-1-worker-7] INFO  CommandRunner$ - Exception: multi_normal_lpdf: Covariance matrix is not symmetric. Covariance matrix[1,2] = -0, but Covariance matrix[2,1] = -1.29569  (in '/home/beakerx/.scalastan/c33c0c5bb17b79070847e22871fe1598404b890b-1/model.stan' at line 14)
2018-10-22 15:02:56:193 +0000 [ForkJoinPool-1-worker-7] INFO  CommandRunner$ - 
2018-10-22 15:02:56:193 +0000 [ForkJoinPool-1-worker-7] INFO  CommandRunner$ - 
2018-10-22 15:02:56:193 +0000 [ForkJoinPool-1-worker-7] INFO  CommandRunner$ - Initialization between (-2, 2) failed after 100 attempts. 
2018-10-22 15:02:56:193 +0000 [ForkJoinPool-1-worker-7] INFO  CommandRunner$ -  Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
2018-10-22 15:02:56:197 +0000 [ForkJoinPool-1-worker-7] INFO  CommandRunner$ - Initialization failed.
2018-10-22 15:02:56:199 +0000 [ForkJoinPool-1-worker-7] ERROR CmdStanRunner - model returned 70

In some cases the asymmetry looks like floating point arithmetic problems, but in other cases it really is quite asymmetric. Any good way to address this?

I never got the LKJ to work with any size > 100. The only thing worked for me was the wishart-barlett decomposition. There is a chapter in the Stan manual. Normally one could calculate the inverse, but usually this failed too. But I’m sure some numerical tricks exists, since this matrix is already tri-diagonal. May one could tune the LKJ too. I went along with the plain barlett-wishart, at least something psd.

1 Like

Super interesting - thanks for sharing. I will try messing around with the bartlett-wishart. It does seem like ultimately this is just a very computationally challenging problem.

I also should have said you want to work with multi_normal_cholesky rather than multi_normal (which takes a covariance matrix as its second argument). So, the whole thing would be

data {
  vector[500] muPrior_mu;
  int<lower=0> n;
  vector[500] y[n];
}
parameters {
  cholesky_factor_corr[500] omega;
  vector[500] tau;
  vector[500] mu;
}
model {
  matrix[500, 500] L = diag_pre_multiply(tau, omega);
  omega ~ lkj_corr_cholesky(2);
  tau ~ cauchy(0,2.5);
  mu ~ multi_normal_cholesky(muPrior_mu, L);
  y ~ multi_normal_cholesky(mu, L);
}

but I am not sure why the some covariance structure applies to the errors in mu as it does for the errors in y. Also, mu could be integrated out.