Sensible values to use when sampling from prior to estimate covariance

I’m trying to better understand the shape of the prior distribution in my model with a multivariate normal distribution. In this model I am attempting to calculate a covariance matrix.

To make setting priors more straightforward I standardize each variable for which I’m calculating the covariance.

My current problem: I can’t find any set of priors that will yield a stable distribution when sampling from the prior.

I’m trying a test model based closely on that described in the Stan Manual 2.17, p155. I currently have:

data{
    int VARCOUNT;
    int LENGTH;
}
transformed data{
  vector[VARCOUNT] zeros = rep_vector(0,VARCOUNT);
}

parameters {
    cholesky_factor_corr[VARCOUNT] L_Omega;
    vector<lower=0>[VARCOUNT] L_sigma;

    //would normally be data, 
    //but since I'm sampling from the prior I've made this a parameter instead.
    vector[VARCOUNT] covar[LENGTH];
}

transformed parameters {
    matrix[VARCOUNT, VARCOUNT] L_Sigma = diag_pre_multiply(L_sigma, L_Omega);
    matrix[VARCOUNT, VARCOUNT] Sigma = L_Sigma * L_Sigma';
}
model{
    L_Omega ~ lkj_corr_cholesky(1); //parameter was 4 in manual but using 1 because our variables are individually normally distributed
    L_sigma ~ cauchy(0,1); //gamma was 2 in manual but using 1 because our variables are normally distributed
    covar ~ multi_normal_cholesky(zeros,L_Sigma);
}

When estimating this model with VARCOUNT=7, LENGTH=178, 500 warmup iterations and 100 further samples per chain, I get stable estimates for everything…except L_sigma ~ cauchy(0.1).

The Rhat values are high because the efficiency for L_sigma is extremely low, sometimes equal to the number of chains, indicating that the model wasn’t able to sample values for L_sigma at all.

I’ve repeated this using priors for the cauchy and lkj_corr_cholesky samplers that are identical to those in the manual p155, but it doesn’t make any difference. I have also tried replacing the cauchy distribution with a truncated normal distribution, or an exponential transform normal distribution, as in:

data{
   int<lower=1> LENGTH;
   int<lower=2> VARCOUNT;
}
transformed data{
  vector[VARCOUNT] zeros = rep_vector(0,VARCOUNT);
}

parameters {
  cholesky_factor_corr[VARCOUNT] L_Omega;
  vector[VARCOUNT] L_sigma_norm;
  
  vector[VARCOUNT] covar[LENGTH];
}

transformed parameters {
  
  vector<lower=0>[VARCOUNT] L_sigma = exp(L_sigma_norm);
  
  matrix[VARCOUNT, VARCOUNT] L_Sigma = diag_pre_multiply(L_sigma, L_Omega);
  matrix[VARCOUNT, VARCOUNT] Sigma = L_Sigma * L_Sigma';
}

model {
  L_Omega ~ lkj_corr_cholesky(1);
  L_sigma_norm ~ normal(0,1);

  covar ~ multi_normal_cholesky(zeros,L_Sigma);
   
}

…to no avail - in each case all parameters except L_sigma are stably estimated, and there are hardly any good samples for L_sigma at all, let alone a stable estimate.

I understand perhaps this model could be insufficiently constrained and thus not be able to get a stable estimate for all parameters.

But surely there is a way to get a stable estimate of the prior of this covariance distribution?

The best thing to do if you really just want to draw from the posterior is to implement it in the generated quantities block. Then you get independent draws. I did it just for your quantity of interest that was failing to make sure it’d work:

generated quantities {
  cholesky_factor_corr[7] L_Omega = lkj_corr_cholesky_rng(7, 1);
  vector[7] sigma;
  matrix[7, 7] L_Sigma;
  for (i in 1:7)
    sigma[i] = lognormal_rng(0, 1);
  L_Sigma = diag_pre_multiply(sigma, L_Omega);
}

Using HMC/NUTS, we can run into difficult geometry with just these priors and no data.

You can reparameterize the Cauchy to make the actual model you draw from less heavy tailed. This is a good idea when there’s not a lot of data, too. There’s a discussion in the user’s guide part of the manual on reparameterizations.

What happens when you use priors with less heavy tails? Did the chains get stuck or just mix slowly?

You might also want to put some kind of prior on the correlation matrix, like an LKJ prior with a parameter greater than 1. I simulated with a 1 in the example above.

~norm(0,1) is less heavy-tailed than ~cauchy(0,1) right? It makes just about no difference. All those parameters except L_sigma are stably estimated (L_Sigma, on the other hand, seems fine, with Rhat~=1.00 and Neff around about half of the maximum).

I’ll try the method you described!

Using the “generated quantities” block worked, thanks!

Glad to hear it. Nothing like Monte Carlo without the Markov chain complications. (Having said that, the MCMC version could probably be fit with some careful reparameterization.)

Infinitely less heavy tailed—Cauchy has infinite variance, normal is finite. But as you note, this often doesn’t matter with priors—it’s the posterior tails we care about.

I’ve noticed that just about any combination of priors I can use for this design gives me a covariance matrix where the diagonal is substantially larger than the other values and the other values have a stronger prior toward very low values.

Assuming my covariance matrix is actually a correlation matrix, (i.e., the data it represents are uniform), I’m wondering what suitable values would give me weakly informative or uninformative priors for covar anywhere in the range of 0-0.5.

I don’t know if that’s possible to pull off. In higher dimensions, the constraints lead to priors concentrating around zero correlation. You could use an LKJ prior with argument much less than 1 (but still greater than 0).

In some sense, the diagonals wind up having to be relatively larger than the values. But they can get close when there’s very high correlation and all variances are roughly the same.