LKJ prior distribution

Jenny · November 19, 2018, 4:33pm

Hi Ben,

Since I want to know the distribution of rho in that matrix . I write below code but it says initialization failure. So do you mean it is impossible to know the distribution of factor rho? Thanks.

model='''
data {
 int D; //number of dimensions
 int K; //number of gaussians
 int N; //number of data
 vector[D] y[N]; //data
}

parameters {
 simplex[K] theta; //mixing proportions
 ordered[D] mu[K]; //mixture component means
 cholesky_factor_corr[D] L[K]; //cholesky factor of correlation matrix
 vector<lower=0>[D] sigma[K]; // standard deviations
}

transformed parameters{    
	real ro[K];
	cholesky_factor_cov[D,D] cov[K];  
    for (k in 1:K) {cov[k] = diag_pre_multiply(sigma[k],L[k]);
    ro[k]=cov[1,2,k]/sqrt(cov[1,1,k])/sqrt(cov[2,2,k]);}	
    }

model {
 real ps[K];
 theta ~ dirichlet(rep_vector(2.0, D));
 for(k in 1:K){
 mu[1,k] ~ normal(3.67,1);// uniform(340/100,380/100);//
 mu[2,k]~  normal(31.80,0.1);//uniform(3160/100,3190/100);//
 sigma[k] ~ uniform(0.01,0.89);//normal(1,2);
 ro[k] ~ uniform(-0.999,0.999);
 L[k] ~ lkj_corr_cholesky(2);
 }
 
 for (n in 1:N){
 for (k in 1:K){
 ps[k] = log(theta[k])+multi_normal_cholesky_lpdf(y[n] | mu[k], L[k]); //increment log probability of the gaussian
 }
 target += log_sum_exp(ps);
 }
}

'''

bgoodri · November 19, 2018, 4:39pm

For the 2x2 case, it should have no trouble initializing even without explicit initial values. What is the error message?

Jenny · November 19, 2018, 4:46pm

I am not sure whether the way of defining the rho is right. Do you think I should define ‘ro’ as a parameter similar as sigma, like writing ‘vector<lower=0>[K] ro[K]’ in parameters part. Below is the error information.

Rejecting initial value:
  Error evaluating the log probability at the initial value.
Exception: uniform_lpdf: Random variable is nan, but must not be nan!  (in 'unknown file name' at line 39)


Initialization between (-2, 2) failed after 100 attempts. 
 Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
Traceback (most recent call last):
  File "C:\Users\Administrator\Desktop\python\tm.py", line 109, in <module>
    fit = pystan.stan(model_code=model,data=dat,iter=5000,warmup=500, chains=1)
  File "C:\Users\Administrator\Miniconda2\lib\site-packages\pystan\api.py", line 441, in stan
    n_jobs=n_jobs, **kwargs)
  File "C:\Users\Administrator\Miniconda2\lib\site-packages\pystan\model.py", line 776, in sampling
    ret_and_samples = _map_parallel(call_sampler_star, call_sampler_args, n_jobs)
  File "C:\Users\Administrator\Miniconda2\lib\site-packages\pystan\model.py", line 91, in _map_parallel
    map_result = list(map(function, args))
  File "stanfit4anon_model_f3f8d8f02b2956a462e6a170da742684_2059118417.pyx", line 370, in stanfit4anon_model_f3f8d8f02b2956a462e6a170da742684_2059118417._call_sampler_star
  File "stanfit4anon_model_f3f8d8f02b2956a462e6a170da742684_2059118417.pyx", line 403, in stanfit4anon_model_f3f8d8f02b2956a462e6a170da742684_2059118417._call_sampler
RuntimeError: Initialization failed.
[Finished in 115.4s]

bgoodri · November 19, 2018, 4:53pm

That is because you are putting a uniform prior density on the (0.01, 0.89) interval but only restricting sigma to be positive. When it gets a leapfrog step that is greater than 0.89, it will throw an exception. Also, you shouldn’t be putting a prior on ro[k], which is a transformed parameter.

Jenny · November 19, 2018, 4:57pm

So how could I restrict sigma to be positive? Also, if I want to assign ro a prior information, how could I change the code?

bgoodri · November 19, 2018, 5:04pm

I would suggest using a non-uniform prior for the sigma parameters that is supported over the entire positive real line, such as exponential.

You have already characterized your beliefs about rho by putting a prior on L. Also, if you wanted to imply a uniform prior distribution on rho, then the correct shape parameter for lkj_corr_cholesky would be 1 rather than 2.

Jenny · November 19, 2018, 6:05pm

Hi

I gave ‘sigma[k] ~ exponential(1);’ and L[k] ~ lkj_corr_cholesky(1);
it still has initialization failure.

Rejecting initial value:
  Gradient evaluated at the initial value is not finite.
  Stan can't start sampling from this initial value.

Initialization between (-2, 2) failed after 100 attempts. 
 Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.

bgoodri · November 19, 2018, 6:18pm

Well, it should be using cov rather than L in the likelihood, but I don’t think that is the problem. Try with init_r = 0.5 or whatever.

Jenny · November 19, 2018, 6:38pm

what you mean init_r = 0.5? I put it in ‘model’ part, but it gives me an error.
Also I have a question on the ‘L’ and ‘cov’ matrix output, why like L[1,2,1] is not equal L[2,1,1]? I think L[1,2,1] and L[2,1,1] present the \rho*\sigma_x*\sigma_y, which need to be equal, right?

model='''
data {
 int D; //number of dimensions
 int K; //number of gaussians
 int N; //number of data
 vector[D] y[N]; //data
}

parameters {
 simplex[K] theta; //mixing proportions
 ordered[D] mu[K]; //mixture component means
 cholesky_factor_corr[D] L[K]; //cholesky factor of correlation matrix
 vector<lower=0>[D] sigma[K]; // standard deviations
 //vector<lower=0>[K] ro[K]; // standard deviations 
}

transformed parameters{ 
    real ro[K];   
	cholesky_factor_cov[D,D] cov[K];  
    for (k in 1:K) {cov[k] = diag_pre_multiply(sigma[k],L[k]);
    ro[k]=cov[1,2,k]/sqrt(cov[1,1,k])/sqrt(cov[2,2,k]);}
    }

model {
 real ps[K];
 theta ~ dirichlet(rep_vector(2.0, D));
 for(k in 1:K){
 mu[1,k] ~ normal(3.67,1);// uniform(340/100,380/100);//
 mu[2,k] ~  normal(31.80,0.1);//uniform(3160/100,3190/100);//
init_r = 0.5;
 sigma[k] ~ lognormal(0,2);//normal(1,2);
 L[k] ~ lkj_corr_cholesky(1);
 }
 

 for (n in 1:N){
 for (k in 1:K){
 ps[k] = log(theta[k])+multi_normal_cholesky_lpdf(y[n] | mu[k], cov[k]); //increment log probability of the gaussian
 }
 target += log_sum_exp(ps);
 }
}

'''

bgoodri · November 19, 2018, 7:10pm

init_r is an argument to the sampling method. If you have

cholesky_factor_cov[D, D] cov[K]

then k = 1:K is the first dimension, i = 1:D, is the second dimension, and j = 1:D is the third dimension. Thus, element [1, 2, 1] is not equal to [2, 1, 1] because they are different matrices. The [1, 2, 1] element is equal to the [1, 1, 2] element.

Jenny · November 19, 2018, 7:21pm

But how to use the init_r?

It doesn’t make any sense.

Take any output example like below. The [1,2,1] element is not equal to [1,1,2] element. And none of the elements are equal.

L[i,k,j] i represent the row, k represents column, j represents gaussian patches, right?

             mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
theta[1]     0.51  1.7e-4 3.7e-4    0.5   0.51   0.51   0.51   0.51    5.0   1.87
theta[2]     0.49  1.7e-4 3.7e-4   0.49   0.49   0.49   0.49    0.5    5.0   1.87
mu[1,1]      1.03  1.5e-3 2.6e-3   1.03   1.03   1.04   1.04   1.04    3.0   1.97
mu[2,1]      0.13    0.03   0.04   0.06   0.09   0.13   0.17    0.2    2.0   2.74
mu[1,2]      1.33  1.3e-3 2.3e-3   1.33   1.33   1.33   1.33   1.33    3.0   1.94
mu[2,2]      1.56    0.05   0.09   1.43   1.48   1.56   1.64    1.7    2.0   2.76
L[1,1,1]      1.0     nan    0.0    1.0    1.0    1.0    1.0    1.0    nan    nan
L[2,1,1]      1.0     nan    0.0    1.0    1.0    1.0    1.0    1.0    nan    nan
L[1,2,1]    -0.59  9.0e-4 1.5e-3  -0.59  -0.59  -0.59  -0.59  -0.59    3.0   1.99
L[2,2,1]      0.4  9.8e-3   0.02   0.37   0.38    0.4   0.41   0.42    2.0   3.22
L[1,1,2]      0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
L[2,1,2]      0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
L[1,2,2]     0.81  6.6e-4 1.1e-3   0.81   0.81   0.81   0.81   0.81    3.0   1.99
L[2,2,2]     0.92  4.2e-3 6.5e-3   0.91   0.91   0.92   0.92   0.93    2.0   3.23
sigma[1,1]   0.17  2.7e-4 4.5e-4   0.17   0.17   0.17   0.17   0.17    3.0   2.89
sigma[2,1]   0.16  1.0e-3 1.7e-3   0.16   0.16   0.16   0.16   0.16    3.0   2.24
sigma[1,2]   0.14  8.3e-5 2.2e-4   0.14   0.14   0.14   0.14   0.14    7.0   1.06
sigma[2,2]   0.89  5.0e-610.0e-5   0.89   0.89   0.89   0.89   0.89  401.0    1.0
ro[1]         nan     nan    nan    nan    nan    nan    nan    nan    nan    nan
ro[2]         0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
cov[1,1,1]   0.17  2.7e-4 4.5e-4   0.17   0.17   0.17   0.17   0.17    3.0   2.89
cov[2,1,1]   0.16  1.0e-3 1.7e-3   0.16   0.16   0.16   0.16   0.16    3.0   2.24
cov[1,2,1]  -0.08  1.2e-4 2.4e-4  -0.08  -0.08  -0.08  -0.08  -0.08    4.0   1.42
cov[2,2,1]   0.35  8.7e-3   0.01   0.33   0.34   0.35   0.36   0.37    2.0   3.22
cov[1,1,2]    0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
cov[2,1,2]    0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
cov[1,2,2]   0.12  1.2e-4 2.6e-4   0.11   0.11   0.12   0.12   0.12    4.0   1.56
cov[2,2,2]   0.82  3.8e-3 5.8e-3   0.81   0.81   0.82   0.82   0.83    2.0   3.23
lp__       -1.0e5  326.22 512.88 -1.0e5 -1.0e5 -1.0e5 -1.0e5 -1.0e5    2.0   2.76

Jenny · November 19, 2018, 7:36pm

why should I use cov rather than L? When I change to cov, the n_effe decrease from several hundred to 2-3…

bgoodri · November 19, 2018, 7:48pm

Well, cov contains cholesky factors of covariance matrices and L contains cholesky factors of correlation matrices. If you use L, that is equivalent to restricting the standard deviations to be exactly 1.0.

bgoodri · November 19, 2018, 7:52pm

What I said before was supposed to pertain to the case where you have symmetric covariance matrices. If you have Cholesky factors of correlation and covariance matrices, then yes they will be asymmetric. But it is still the case that [k, i, j] refers the k-th matrix’s i-th row and j-th column.

The init_r argument is documented at
https://pystan.readthedocs.io/en/latest/api.html

Jenny · November 19, 2018, 8:05pm

Okay, thanks, I agree using cov instead of L.
For the asymmetric Cholesky factors of correlation and covariance matrices, the question back to how to get the \rho distribution? I set ‘ro[k]=cov[1,2,k]/sqrt(cov[1,1,k])/sqrt(cov[2,2,k]);’, is it right?

bgoodri · November 19, 2018, 8:32pm

You can just do L * transpose(L) to get the whole correlation matrix.

Jenny · November 19, 2018, 8:49pm

Hi, I added code like that. The Sigma here is symmetric now. So the Sigma[1,1,1] is equal to the 1st matrix \sigma_x^2, right?

generated quantities {
real ro[K];
  matrix[D,D] Omega[K];
  matrix[D,D] Sigma[K];
  for (k in 1:K){
  Omega[k] = multiply_lower_tri_self_transpose(L[k]);
  Sigma[k] = quad_form_diag(Omega[k], sigma[k]); 
  ro[k]=Sigma[k,2,1]/sqrt(Sigma[k,1,1])/sqrt(Sigma[k,2,2]);	
  }

bgoodri · November 19, 2018, 9:38pm

Yes, but ro[k] is just Omega[k][2,1].

Jenny · November 19, 2018, 9:56pm

Okay, thanks! One more quick question, I already know the range of sigma would be within 0.01- 0.89, so what kind of prior distribution is suitable for sigma? Using exponential gives me 0-infinity range, so I want to change one. Or is truncated exponential possible in stan?

bgoodri · November 19, 2018, 10:04pm

Truncated exponential is possible. Beta on that range would probably be better. It is hard to think of a theoretical reason for an upper bound of 0.89 though.

Topic		Replies	Views
Underestimating correlation coefficients with LKJ prior Modeling	24	6786	September 15, 2020
Question about LKJ normalizing constant Modeling	21	2272	February 5, 2018
Learning a correlation matrix Modeling	8	3346	December 13, 2017
Confusion about lkj prior in mixture modeling example Modeling	3	1173	August 30, 2017
Gaussian LDA Modeling fitting-issues	18	3288	August 30, 2017

LKJ prior distribution

Related topics