LKJ prior distribution

Hi Ben,

Since I want to know the distribution of rho in that matrix image. I write below code but it says initialization failure. So do you mean it is impossible to know the distribution of factor rho? Thanks.


model='''
data {
 int D; //number of dimensions
 int K; //number of gaussians
 int N; //number of data
 vector[D] y[N]; //data
}

parameters {
 simplex[K] theta; //mixing proportions
 ordered[D] mu[K]; //mixture component means
 cholesky_factor_corr[D] L[K]; //cholesky factor of correlation matrix
 vector<lower=0>[D] sigma[K]; // standard deviations
}

transformed parameters{    
	real ro[K];
	cholesky_factor_cov[D,D] cov[K];  
    for (k in 1:K) {cov[k] = diag_pre_multiply(sigma[k],L[k]);
    ro[k]=cov[1,2,k]/sqrt(cov[1,1,k])/sqrt(cov[2,2,k]);}	
    }

model {
 real ps[K];
 theta ~ dirichlet(rep_vector(2.0, D));
 for(k in 1:K){
 mu[1,k] ~ normal(3.67,1);// uniform(340/100,380/100);//
 mu[2,k]~  normal(31.80,0.1);//uniform(3160/100,3190/100);//
 sigma[k] ~ uniform(0.01,0.89);//normal(1,2);
 ro[k] ~ uniform(-0.999,0.999);
 L[k] ~ lkj_corr_cholesky(2);
 }
 
 for (n in 1:N){
 for (k in 1:K){
 ps[k] = log(theta[k])+multi_normal_cholesky_lpdf(y[n] | mu[k], L[k]); //increment log probability of the gaussian
 }
 target += log_sum_exp(ps);
 }
}

'''

For the 2x2 case, it should have no trouble initializing even without explicit initial values. What is the error message?

I am not sure whether the way of defining the rho is right. Do you think I should define ‘ro’ as a parameter similar as sigma, like writing ‘vector<lower=0>[K] ro[K]’ in parameters part. Below is the error information.


Rejecting initial value:
  Error evaluating the log probability at the initial value.
Exception: uniform_lpdf: Random variable is nan, but must not be nan!  (in 'unknown file name' at line 39)


Initialization between (-2, 2) failed after 100 attempts. 
 Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
Traceback (most recent call last):
  File "C:\Users\Administrator\Desktop\python\tm.py", line 109, in <module>
    fit = pystan.stan(model_code=model,data=dat,iter=5000,warmup=500, chains=1)
  File "C:\Users\Administrator\Miniconda2\lib\site-packages\pystan\api.py", line 441, in stan
    n_jobs=n_jobs, **kwargs)
  File "C:\Users\Administrator\Miniconda2\lib\site-packages\pystan\model.py", line 776, in sampling
    ret_and_samples = _map_parallel(call_sampler_star, call_sampler_args, n_jobs)
  File "C:\Users\Administrator\Miniconda2\lib\site-packages\pystan\model.py", line 91, in _map_parallel
    map_result = list(map(function, args))
  File "stanfit4anon_model_f3f8d8f02b2956a462e6a170da742684_2059118417.pyx", line 370, in stanfit4anon_model_f3f8d8f02b2956a462e6a170da742684_2059118417._call_sampler_star
  File "stanfit4anon_model_f3f8d8f02b2956a462e6a170da742684_2059118417.pyx", line 403, in stanfit4anon_model_f3f8d8f02b2956a462e6a170da742684_2059118417._call_sampler
RuntimeError: Initialization failed.
[Finished in 115.4s]

That is because you are putting a uniform prior density on the (0.01, 0.89) interval but only restricting sigma to be positive. When it gets a leapfrog step that is greater than 0.89, it will throw an exception. Also, you shouldn’t be putting a prior on ro[k], which is a transformed parameter.

So how could I restrict sigma to be positive? Also, if I want to assign ro a prior information, how could I change the code?

I would suggest using a non-uniform prior for the sigma parameters that is supported over the entire positive real line, such as exponential.

You have already characterized your beliefs about rho by putting a prior on L. Also, if you wanted to imply a uniform prior distribution on rho, then the correct shape parameter for lkj_corr_cholesky would be 1 rather than 2.

Hi

I gave ‘sigma[k] ~ exponential(1);’ and L[k] ~ lkj_corr_cholesky(1);
it still has initialization failure.


Rejecting initial value:
  Gradient evaluated at the initial value is not finite.
  Stan can't start sampling from this initial value.

Initialization between (-2, 2) failed after 100 attempts. 
 Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.

Well, it should be using cov rather than L in the likelihood, but I don’t think that is the problem. Try with init_r = 0.5 or whatever.

what you mean init_r = 0.5? I put it in ‘model’ part, but it gives me an error.
Also I have a question on the ‘L’ and ‘cov’ matrix output, why like L[1,2,1] is not equal L[2,1,1]? I think L[1,2,1] and L[2,1,1] present the \rho*\sigma_x*\sigma_y, which need to be equal, right?


model='''
data {
 int D; //number of dimensions
 int K; //number of gaussians
 int N; //number of data
 vector[D] y[N]; //data
}

parameters {
 simplex[K] theta; //mixing proportions
 ordered[D] mu[K]; //mixture component means
 cholesky_factor_corr[D] L[K]; //cholesky factor of correlation matrix
 vector<lower=0>[D] sigma[K]; // standard deviations
 //vector<lower=0>[K] ro[K]; // standard deviations 
}

transformed parameters{ 
    real ro[K];   
	cholesky_factor_cov[D,D] cov[K];  
    for (k in 1:K) {cov[k] = diag_pre_multiply(sigma[k],L[k]);
    ro[k]=cov[1,2,k]/sqrt(cov[1,1,k])/sqrt(cov[2,2,k]);}
    }

model {
 real ps[K];
 theta ~ dirichlet(rep_vector(2.0, D));
 for(k in 1:K){
 mu[1,k] ~ normal(3.67,1);// uniform(340/100,380/100);//
 mu[2,k] ~  normal(31.80,0.1);//uniform(3160/100,3190/100);//
init_r = 0.5;
 sigma[k] ~ lognormal(0,2);//normal(1,2);
 L[k] ~ lkj_corr_cholesky(1);
 }
 

 for (n in 1:N){
 for (k in 1:K){
 ps[k] = log(theta[k])+multi_normal_cholesky_lpdf(y[n] | mu[k], cov[k]); //increment log probability of the gaussian
 }
 target += log_sum_exp(ps);
 }
}

'''

init_r is an argument to the sampling method. If you have

cholesky_factor_cov[D, D] cov[K]

then k = 1:K is the first dimension, i = 1:D, is the second dimension, and j = 1:D is the third dimension. Thus, element [1, 2, 1] is not equal to [2, 1, 1] because they are different matrices. The [1, 2, 1] element is equal to the [1, 1, 2] element.

But how to use the init_r?

It doesn’t make any sense.

Take any output example like below. The [1,2,1] element is not equal to [1,1,2] element. And none of the elements are equal.

L[i,k,j] i represent the row, k represents column, j represents gaussian patches, right?


             mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
theta[1]     0.51  1.7e-4 3.7e-4    0.5   0.51   0.51   0.51   0.51    5.0   1.87
theta[2]     0.49  1.7e-4 3.7e-4   0.49   0.49   0.49   0.49    0.5    5.0   1.87
mu[1,1]      1.03  1.5e-3 2.6e-3   1.03   1.03   1.04   1.04   1.04    3.0   1.97
mu[2,1]      0.13    0.03   0.04   0.06   0.09   0.13   0.17    0.2    2.0   2.74
mu[1,2]      1.33  1.3e-3 2.3e-3   1.33   1.33   1.33   1.33   1.33    3.0   1.94
mu[2,2]      1.56    0.05   0.09   1.43   1.48   1.56   1.64    1.7    2.0   2.76
L[1,1,1]      1.0     nan    0.0    1.0    1.0    1.0    1.0    1.0    nan    nan
L[2,1,1]      1.0     nan    0.0    1.0    1.0    1.0    1.0    1.0    nan    nan
L[1,2,1]    -0.59  9.0e-4 1.5e-3  -0.59  -0.59  -0.59  -0.59  -0.59    3.0   1.99
L[2,2,1]      0.4  9.8e-3   0.02   0.37   0.38    0.4   0.41   0.42    2.0   3.22
L[1,1,2]      0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
L[2,1,2]      0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
L[1,2,2]     0.81  6.6e-4 1.1e-3   0.81   0.81   0.81   0.81   0.81    3.0   1.99
L[2,2,2]     0.92  4.2e-3 6.5e-3   0.91   0.91   0.92   0.92   0.93    2.0   3.23
sigma[1,1]   0.17  2.7e-4 4.5e-4   0.17   0.17   0.17   0.17   0.17    3.0   2.89
sigma[2,1]   0.16  1.0e-3 1.7e-3   0.16   0.16   0.16   0.16   0.16    3.0   2.24
sigma[1,2]   0.14  8.3e-5 2.2e-4   0.14   0.14   0.14   0.14   0.14    7.0   1.06
sigma[2,2]   0.89  5.0e-610.0e-5   0.89   0.89   0.89   0.89   0.89  401.0    1.0
ro[1]         nan     nan    nan    nan    nan    nan    nan    nan    nan    nan
ro[2]         0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
cov[1,1,1]   0.17  2.7e-4 4.5e-4   0.17   0.17   0.17   0.17   0.17    3.0   2.89
cov[2,1,1]   0.16  1.0e-3 1.7e-3   0.16   0.16   0.16   0.16   0.16    3.0   2.24
cov[1,2,1]  -0.08  1.2e-4 2.4e-4  -0.08  -0.08  -0.08  -0.08  -0.08    4.0   1.42
cov[2,2,1]   0.35  8.7e-3   0.01   0.33   0.34   0.35   0.36   0.37    2.0   3.22
cov[1,1,2]    0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
cov[2,1,2]    0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
cov[1,2,2]   0.12  1.2e-4 2.6e-4   0.11   0.11   0.12   0.12   0.12    4.0   1.56
cov[2,2,2]   0.82  3.8e-3 5.8e-3   0.81   0.81   0.82   0.82   0.83    2.0   3.23
lp__       -1.0e5  326.22 512.88 -1.0e5 -1.0e5 -1.0e5 -1.0e5 -1.0e5    2.0   2.76

why should I use cov rather than L? When I change to cov, the n_effe decrease from several hundred to 2-3…

Well, cov contains cholesky factors of covariance matrices and L contains cholesky factors of correlation matrices. If you use L, that is equivalent to restricting the standard deviations to be exactly 1.0.

What I said before was supposed to pertain to the case where you have symmetric covariance matrices. If you have Cholesky factors of correlation and covariance matrices, then yes they will be asymmetric. But it is still the case that [k, i, j] refers the k-th matrix’s i-th row and j-th column.

The init_r argument is documented at
https://pystan.readthedocs.io/en/latest/api.html

Okay, thanks, I agree using cov instead of L.
For the asymmetric Cholesky factors of correlation and covariance matrices, the question back to how to get the \rho distribution? I set ‘ro[k]=cov[1,2,k]/sqrt(cov[1,1,k])/sqrt(cov[2,2,k]);’, is it right?

You can just do L * transpose(L) to get the whole correlation matrix.

Hi, I added code like that. The Sigma here is symmetric now. So the Sigma[1,1,1] is equal to the 1st matrix \sigma_x^2, right?


generated quantities {
real ro[K];
  matrix[D,D] Omega[K];
  matrix[D,D] Sigma[K];
  for (k in 1:K){
  Omega[k] = multiply_lower_tri_self_transpose(L[k]);
  Sigma[k] = quad_form_diag(Omega[k], sigma[k]); 
  ro[k]=Sigma[k,2,1]/sqrt(Sigma[k,1,1])/sqrt(Sigma[k,2,2]);	
  }

Yes, but ro[k] is just Omega[k][2,1].

1 Like

Okay, thanks! One more quick question, I already know the range of sigma would be within 0.01- 0.89, so what kind of prior distribution is suitable for sigma? Using exponential gives me 0-infinity range, so I want to change one. Or is truncated exponential possible in stan?

Truncated exponential is possible. Beta on that range would probably be better. It is hard to think of a theoretical reason for an upper bound of 0.89 though.