Sensible values to use when sampling from prior to estimate covariance

benjsmith · July 14, 2018, 1:15am

I’m trying to better understand the shape of the prior distribution in my model with a multivariate normal distribution. In this model I am attempting to calculate a covariance matrix.

To make setting priors more straightforward I standardize each variable for which I’m calculating the covariance.

My current problem: I can’t find any set of priors that will yield a stable distribution when sampling from the prior.

I’m trying a test model based closely on that described in the Stan Manual 2.17, p155. I currently have:

data{
    int VARCOUNT;
    int LENGTH;
}
transformed data{
  vector[VARCOUNT] zeros = rep_vector(0,VARCOUNT);
}

parameters {
    cholesky_factor_corr[VARCOUNT] L_Omega;
    vector<lower=0>[VARCOUNT] L_sigma;

    //would normally be data, 
    //but since I'm sampling from the prior I've made this a parameter instead.
    vector[VARCOUNT] covar[LENGTH];
}

transformed parameters {
    matrix[VARCOUNT, VARCOUNT] L_Sigma = diag_pre_multiply(L_sigma, L_Omega);
    matrix[VARCOUNT, VARCOUNT] Sigma = L_Sigma * L_Sigma';
}
model{
    L_Omega ~ lkj_corr_cholesky(1); //parameter was 4 in manual but using 1 because our variables are individually normally distributed
    L_sigma ~ cauchy(0,1); //gamma was 2 in manual but using 1 because our variables are normally distributed
    covar ~ multi_normal_cholesky(zeros,L_Sigma);
}

When estimating this model with VARCOUNT=7, LENGTH=178, 500 warmup iterations and 100 further samples per chain, I get stable estimates for everything…except L_sigma ~ cauchy(0.1).

The Rhat values are high because the efficiency for L_sigma is extremely low, sometimes equal to the number of chains, indicating that the model wasn’t able to sample values for L_sigma at all.

I’ve repeated this using priors for the cauchy and lkj_corr_cholesky samplers that are identical to those in the manual p155, but it doesn’t make any difference. I have also tried replacing the cauchy distribution with a truncated normal distribution, or an exponential transform normal distribution, as in:

data{
   int<lower=1> LENGTH;
   int<lower=2> VARCOUNT;
}
transformed data{
  vector[VARCOUNT] zeros = rep_vector(0,VARCOUNT);
}

parameters {
  cholesky_factor_corr[VARCOUNT] L_Omega;
  vector[VARCOUNT] L_sigma_norm;
  
  vector[VARCOUNT] covar[LENGTH];
}

transformed parameters {
  
  vector<lower=0>[VARCOUNT] L_sigma = exp(L_sigma_norm);
  
  matrix[VARCOUNT, VARCOUNT] L_Sigma = diag_pre_multiply(L_sigma, L_Omega);
  matrix[VARCOUNT, VARCOUNT] Sigma = L_Sigma * L_Sigma';
}

model {
  L_Omega ~ lkj_corr_cholesky(1);
  L_sigma_norm ~ normal(0,1);

  covar ~ multi_normal_cholesky(zeros,L_Sigma);
   
}

…to no avail - in each case all parameters except L_sigma are stably estimated, and there are hardly any good samples for L_sigma at all, let alone a stable estimate.

I understand perhaps this model could be insufficiently constrained and thus not be able to get a stable estimate for all parameters.

But surely there is a way to get a stable estimate of the prior of this covariance distribution?

Bob_Carpenter · July 15, 2018, 1:05am

The best thing to do if you really just want to draw from the posterior is to implement it in the generated quantities block. Then you get independent draws. I did it just for your quantity of interest that was failing to make sure it’d work:

generated quantities {
  cholesky_factor_corr[7] L_Omega = lkj_corr_cholesky_rng(7, 1);
  vector[7] sigma;
  matrix[7, 7] L_Sigma;
  for (i in 1:7)
    sigma[i] = lognormal_rng(0, 1);
  L_Sigma = diag_pre_multiply(sigma, L_Omega);
}

Using HMC/NUTS, we can run into difficult geometry with just these priors and no data.

You can reparameterize the Cauchy to make the actual model you draw from less heavy tailed. This is a good idea when there’s not a lot of data, too. There’s a discussion in the user’s guide part of the manual on reparameterizations.

What happens when you use priors with less heavy tails? Did the chains get stuck or just mix slowly?

You might also want to put some kind of prior on the correlation matrix, like an LKJ prior with a parameter greater than 1. I simulated with a 1 in the example above.

benjsmith · July 16, 2018, 4:39pm

~norm(0,1) is less heavy-tailed than ~cauchy(0,1) right? It makes just about no difference. All those parameters except L_sigma are stably estimated (L_Sigma, on the other hand, seems fine, with Rhat~=1.00 and Neff around about half of the maximum).

I’ll try the method you described!

benjsmith · July 16, 2018, 6:33pm

Using the “generated quantities” block worked, thanks!

Bob_Carpenter · July 17, 2018, 3:31pm

Glad to hear it. Nothing like Monte Carlo without the Markov chain complications. (Having said that, the MCMC version could probably be fit with some careful reparameterization.)

Infinitely less heavy tailed—Cauchy has infinite variance, normal is finite. But as you note, this often doesn’t matter with priors—it’s the posterior tails we care about.

benjsmith · July 27, 2018, 10:42pm

I’ve noticed that just about any combination of priors I can use for this design gives me a covariance matrix where the diagonal is substantially larger than the other values and the other values have a stronger prior toward very low values.

Assuming my covariance matrix is actually a correlation matrix, (i.e., the data it represents are uniform), I’m wondering what suitable values would give me weakly informative or uninformative priors for covar anywhere in the range of 0-0.5.

Bob_Carpenter · July 28, 2018, 1:21pm

I don’t know if that’s possible to pull off. In higher dimensions, the constraints lead to priors concentrating around zero correlation. You could use an LKJ prior with argument much less than 1 (but still greater than 0).

In some sense, the diagonals wind up having to be relatively larger than the values. But they can get close when there’s very high correlation and all variances are roughly the same.

Topic		Replies	Views
Covariance prior via lkj_corr_cholesky Modeling fitting-issues , covariance	4	543	December 21, 2023
How to set up the priors for the random effects? Modeling rstan , techniques , fitting-issues , performance	8	1586	February 11, 2021
Underestimating correlation coefficients with LKJ prior Modeling	24	6645	September 15, 2020
Initialization failed - symmetric and positive definite error Modeling techniques , fitting-issues , specification	3	1144	October 2, 2017
LKJ prior distribution Modeling fitting-issues	44	6179	June 25, 2019

Sensible values to use when sampling from prior to estimate covariance

Related topics