LKJ prior distribution

Chao · November 16, 2018, 1:48pm

Hi
I have a questions on LKJ prior distribution. My model is a mixture gaussian model. So I set LKJ prior as the covariance matrix of a multivariate model distribution. My used prior for sigma is a uniform distribution range from (0.01-0.08), and rho is a uniform distribution range from (-0.999 - 0.999). But the model ‘lkj_corr_cholesky()’ only has one parameter. I set '‘L[k] ~ lkj_corr_cholesky(10)’. How can I tell the model the exact range I want?
Thanks.

model='''
data {
 int D; //number of dimensions
 int K; //number of gaussians
 int N; //number of data
 vector[D] y[N]; //data

}

parameters {
 simplex[K] theta; //mixing proportions
 ordered[D] mu[K]; //mixture component means
 cholesky_factor_corr[D] L[K]; //cholesky factor of covariance
}


model {
 real ps[K];
 theta ~ dirichlet(rep_vector(2.0, D));
 for(k in 1:K){
 mu[1,k] ~ uniform(360/100,375/100);//normal(365/100,2);//
 mu[2,k]~ uniform(3165/100,3180/100);//normal(3170/100,5);//
 L[k] ~ lkj_corr_cholesky(10);
 }
 

 for (n in 1:N){
 for (k in 1:K){
 ps[k] = log(theta[k])+multi_normal_cholesky_lpdf(y[n] | mu[k], L[k]); //increment log probability of the gaussian
 }
 target += log_sum_exp(ps);
 }
}

bgoodri · November 16, 2018, 3:00pm

There is prior density on the whole (-1,1) interval but not very much at the endpoints when D is greater than 2.

Chao · November 16, 2018, 3:10pm

Could you explain a little bit on how the ‘L[k] ~ lkj_corr_cholesky(10)’ work?
I don’t get what you mean by saying (-1,1) when D is greater than 2.
Thanks.

bgoodri · November 16, 2018, 3:20pm

D is the size of the correlation matrix in your code. Say \eta \geq 1 is the shape parameter for the LKJ distribution, which works a lot like the shape parameter of a symmetric beta distribution, shifted and scaled to the (-1,1) interval. In the case of a beta(10, 10) distribution, it looks like

So, it technically has positive density near 0 and 1, but in practice it is essentially zero past 0.2 and 0.8. It is the same with LKJ(10) in the sense that there is positive density out to -1 and 1, but almost all the prior density is between -0.4 and 0.4.

Chao · November 16, 2018, 3:27pm

Is LKJ() prior distribution similar as Beta distribution? Do you mean all the parameter in the LKJ matrix is the same range?
Is there a way for me to tell the model more accurate prior distribution? Or do you mean it does not matter assign ‘L[k] ~ lkj_corr_cholesky(10)’ or ‘L[k] ~ lkj_corr_cholesky(5)’?

bgoodri · November 16, 2018, 5:34pm

The LKJ distribution for correlation matrices is basically a generalization of the beta distribution for scalars. By definition, a correlation is between -1 and 1 so the density of the LKJ distribution can be evaluated for any correlation matrix (that is positive definite). There is no generic way in Stan to construct a correlation matrix where the correlations are restricted to a subinterval of the (-1,1) interval.

A LKJ(10) differs from a LKJ(5) in much the same way a beta(10, 10) differs from a beta(5,5), in the sense that a beta(5,5) has less concentration at its mode. In the case of a correlation matrix, the mode is the identity matrix.

Chao · November 16, 2018, 6:28pm

thanks for your information.
Then do you know why at present there are several parameters in the L matrix are ‘nan’, but actually they should not be like that. How can I fix that? I put my code and outputs below. Thanks.

model='''
data {
 int D; //number of dimensions
 int K; //number of gaussians
 int N; //number of data
 vector[D] y[N]; //data
}

parameters {
 simplex[K] theta; //mixing proportions
 ordered[D] mu[K]; //mixture component means
 cholesky_factor_corr[D] L[K]; //cholesky factor of covariance
}

transformed parameters{
	cholesky_factor_corr[D] cov[K];  
    real sx[K];
    real sy[K];
    real ro[K];
    cov=L;  
    for (i in 1:K){
    sx[i]=sqrt(cov[i,1,1]);
    sy[i]=sqrt(cov[i,2,2]);
    ro[i]=cov[i,1,2]/sqrt(cov[i,1,1])/sqrt(cov[i,2,2]);
    }
}

model {
 real ps[K];
 theta ~ dirichlet(rep_vector(2.0, D));
 for(k in 1:K){
 mu[1,k] ~ normal(3.67,1);//uniform(340/100,380/100);//
 mu[2,k]~  normal(31.80,1);//uniform(3160/100,3190/100);//
 L[k] ~ lkj_corr_cholesky(4);
 }
 

 for (n in 1:N){
 for (k in 1:K){
 ps[k] = log(theta[k])+multi_normal_cholesky_lpdf(y[n] | mu[k], L[k]); //increment log probability of the gaussian
 }
 target += log_sum_exp(ps);
 }
}

'''
Outputs:
             mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
theta[1]     0.15  1.7e-3    0.1   0.02   0.08   0.14   0.21   0.39 3043.0    1.0
theta[2]     0.85  1.7e-3    0.1   0.61   0.79   0.86   0.92   0.98 3043.0    1.0
mu[1,1]       3.1    0.01   0.81   1.47   2.56   3.11   3.66   4.66 5314.0    1.0
mu[2,1]       6.4  6.7e-3   0.32   5.79   6.18    6.4   6.62   7.03 2222.0    1.0
mu[1,2]      4.23  8.4e-3   0.81   2.72   3.68    4.2   4.76    5.9 9322.0    1.0
mu[2,2]     31.73    0.06   1.27  29.28  30.85  31.76  32.66  34.07  479.0    1.0
L[1,1,1]      1.0     nan    0.0    1.0    1.0    1.0    1.0    1.0    nan    nan
L[2,1,1]      1.0     nan    0.0    1.0    1.0    1.0    1.0    1.0    nan    nan
L[1,2,1]  -4.0e-4  4.2e-3   0.33  -0.63  -0.24 3.6e-3   0.24   0.63 6401.0    1.0
L[2,2,1]  -7.9e-3    0.02   0.48  -0.97  -0.34 5.4e-3   0.35   0.89  430.0    1.0
L[1,1,2]      0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
L[2,1,2]      0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
L[1,2,2]     0.94  1.2e-3   0.08   0.71   0.92   0.97   0.99    1.0 4204.0    1.0
L[2,2,2]     0.85    0.02   0.21   0.22   0.81   0.94   0.99    1.0  147.0   1.01
cov[1,1,1]    1.0     nan    0.0    1.0    1.0    1.0    1.0    1.0    nan    nan
cov[2,1,1]    1.0     nan    0.0    1.0    1.0    1.0    1.0    1.0    nan    nan
cov[1,2,1]-4.0e-4  4.2e-3   0.33  -0.63  -0.24 3.6e-3   0.24   0.63 6401.0    1.0
cov[2,2,1]-7.9e-3    0.02   0.48  -0.97  -0.34 5.4e-3   0.35   0.89  430.0    1.0
cov[1,1,2]    0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
cov[2,1,2]    0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
cov[1,2,2]   0.94  1.2e-3   0.08   0.71   0.92   0.97   0.99    1.0 4204.0    1.0
cov[2,2,2]   0.85    0.02   0.21   0.22   0.81   0.94   0.99    1.0  147.0   1.01
sx[1]         1.0     nan    0.0    1.0    1.0    1.0    1.0    1.0    nan    nan
sx[2]         1.0     nan    0.0    1.0    1.0    1.0    1.0    1.0    nan    nan
sy[1]        0.97  6.6e-4   0.04   0.85   0.96   0.99    1.0    1.0 4208.0    1.0
sy[2]        0.91    0.01   0.14   0.47    0.9   0.97   0.99    1.0  118.0   1.01
ro[1]         0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
ro[2]         0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
lp__       -379.3    0.03   1.91 -383.9 -380.4 -379.0 -377.9 -376.5 3771.0    1.0

Samples were drawn using NUTS at 11/16/18 13:24:23.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

bgoodri · November 16, 2018, 6:38pm

If you mean there are nan in the printed output summary, that is because those are fixed quantities (to either 1 or 0) rather than random variables and hence do not have Rhat, neff, or standard errors for the mean.

Chao · November 16, 2018, 6:46pm

Yes, I know that. I am not sure the way I write the code is right or wrong. My understanding of LKJ distribution is combination of sigmax, sigmay and rho. For example, the ‘L’ in my code is the same as co-variance matrix of 2-D gaussian model. (sx[i] is the sigmax, sy[i] is the sigmay and ro[i] is the correlation factor)
The L[1,1,1] = sx[i]*sx[i]
L[1,1,2] = ro[i]*sx[i]*sy[i]
L[1,2,2] = sy[i]*sy[i]

So then I define sx[i] = sqrt[L[1,1,1]]
Then print the distribution of sx[i]. The result shows it is ‘nan’.

So I wonder whether my definition is wrong? Thanks for your help.

bgoodri · November 16, 2018, 6:52pm

sx[1]         1.0     nan    0.0    1.0    1.0    1.0    1.0    1.0    nan    nan
sx[2]         1.0     nan    0.0    1.0    1.0    1.0    1.0    1.0    nan    nan

It is telling you that sx[1] and sx[2] are always 1 and never themselves NaN, but the standard error of the mean, the neff, and the Rhat are all undefined because those are fixed rather than random variables.

Chao · November 16, 2018, 6:56pm

So do you think my way of defining those parameters are right?

bgoodri · November 16, 2018, 7:05pm

It is valid syntax, but does not make much sense. This comment is not correct

cholesky_factor_corr[D] L[K]; //cholesky factor of covariance

It is actually an array of Cholesky factors of a correlation matrix. You can create the Cholesky factor of a covariance matrix by pre-multiplying by standard deviations as in

parameters {
 simplex[K] theta; //mixing proportions
 ordered[D] mu[K]; //mixture component means
 cholesky_factor_corr[D] L[K]; //cholesky factor of correlation
 vector<lower=0> sigma[K]; // standard deviations
}
transformed parameters{
  cholesky_factor_corr[D] cov[K];  
  for (k in 1:K) cov[k] = diag_pre_multiply(sigma[k], L[k]);
}

and then in the model block, refer to multi_normal_cholesky_lpdf(y[n] | mu[k], L[k]) and put priors on sigma.

Jenny · November 16, 2018, 7:48pm

Hi,

After I change the code to the way you suggest, there is one error.

model='''
data {
 int D; //number of dimensions
 int K; //number of gaussians
 int N; //number of data
 vector[D] y[N]; //data
}

parameters {
 simplex[K] theta; //mixing proportions
 ordered[D] mu[K]; //mixture component means
 cholesky_factor_corr[D] L[K]; //cholesky factor of correlation matrix
 vector<lower=0> sigma[k]; // standard deviations
}

transformed parameters{
	cholesky_factor_corr[D] cov[K];  
    for (k in 1:k) cov[k] = diag_pre_multiply(sigma[k],L[k]);
}

model {
 real ps[K];
 theta ~ dirichlet(rep_vector(2.0, D));
 for(k in 1:K){
 mu[1,k] ~ normal(3.67,1);//uniform(340/100,380/100);//
 mu[2,k]~  normal(31.80,1);//uniform(3160/100,3190/100);//
 sigma[k] ~ uniform(0.01,0.89);
 L[k] ~ lkj_corr_cholesky(2);
 }
 

 for (n in 1:N){
 for (k in 1:K){
 ps[k] = log(theta[k])+multi_normal_cholesky_lpdf(y[n] | mu[k], sigma[k],L[k]); //increment log probability of the gaussian
 }
 target += log_sum_exp(ps);
 }
}

'''
The errors are:
error in 'unknown file name' at line 13, column 17
  -------------------------------------------------
    11:  ordered[D] mu[K]; //mixture component means
    12:  cholesky_factor_corr[D] L[K]; //cholesky factor of correlation matrix
    13:  vector<lower=0> sigma[k]; // standard deviations
                        ^
    14: }
  -------------------------------------------------

PARSER EXPECTED: <size declaration: integer (data-only) in square brackets>

[Finished in 8.7s]

bgoodri · November 16, 2018, 8:04pm

That was supposed to be

vector<lower=0>[K] sigma[D]; // standard deviations

Jenny · November 16, 2018, 8:43pm

Hi,
This time the problem becomes initialization failed. So could it be my prior information is not correct? Or other reasons?
The errors are:

‘’'stan
Rejecting initial value:
Error evaluating the log probability at the initial value.
Exception: anon_model_e64dd13f25b4fc9b418d05db6aed84e9_namespace::write_array: cov[k0__] is not a valid unit vector. The sum of the squares of the elements should be 1, but is 50.5794 (in ‘unknown file name’ at line 17)

bgoodri · November 16, 2018, 9:10pm

Sorry again. The first line in transformed parameters should have been

cholesky_factor_cov[D, D] cov[K];

Jenny · November 16, 2018, 10:08pm

Should I change the ‘transformed parameters’ as well? I changed to below way and get an error.

model='''
data {
 int D; //number of dimensions
 int K; //number of gaussians
 int N; //number of data
 vector[D] y[N]; //data
}

parameters {
 simplex[K] theta; //mixing proportions
 ordered[D] mu[K]; //mixture component means
 cholesky_factor_cov[D,D] cov[K]; //cholesky factor of correlation matrix
 vector<lower=0>[D] sigma[K]; // standard deviations
}

transformed parameters{
	cholesky_factor_cov[D,D] cov[K];  
    for (k in 1:K) cov[k] = diag_pre_multiply(sigma[k],L[k]);
}

model {
 real ps[K];
 theta ~ dirichlet(rep_vector(2.0, D));
 for(k in 1:K){
 mu[1,k] ~ normal(3.67,1);//uniform(340/100,380/100);//
 mu[2,k]~  normal(31.80,1);//uniform(3160/100,3190/100);//
 sigma[k] ~ normal(1,2);
 L[k] ~ lkj_corr_cholesky(2);
 }
 

 for (n in 1:N){
 for (k in 1:K){
 ps[k] = log(theta[k])+multi_normal_cholesky_lpdf(y[n] | mu[k], cov[k]); //increment log probability of the gaussian
 }
 target += log_sum_exp(ps);
 }
}

'''

The errors are:
duplicate declaration of variable, name=cov; attempt to redeclare as transformed parameter; original declaration as parameter
Problem with declaration.
  error in 'unknown file name' at line 17, column 35
  -------------------------------------------------
    15: 
    16: transformed parameters{
    17: 	cholesky_factor_cov[D,D] cov[K];  
                                          ^
    18:     for (k in 1:K) cov[k] = diag_pre_multiply(sigma[k],L[k]);
  -------------------------------------------------


[Finished in 9.5s]

bgoodri · November 17, 2018, 12:59am

The one in the parameters block should be

cholesky_factor_corr[D] L[K]; // cholesky factor of correlation matrix

Jenny · November 19, 2018, 3:23pm

Hi,
Thanks for the help. Your way works! I still carious about the meaning of the parameters. My question is how can I get the distribution of correlation?

model='''
data {
 int D; //number of dimensions
 int K; //number of gaussians
 int N; //number of data
 vector[D] y[N]; //data
}

parameters {
 simplex[K] theta; //mixing proportions
 ordered[D] mu[K]; //mixture component means
 cholesky_factor_corr[D] L[K]; //cholesky factor of correlation matrix
 vector<lower=0>[D] sigma[K]; // standard deviations
}

transformed parameters{
	cholesky_factor_cov[D,D] cov[K];  
    for (k in 1:K) cov[k] = diag_pre_multiply(sigma[k],L[k]);
    }	
//	real sx[K];
//    real sy[K];
//    real ro[K];
//    cov=L;  
//    for (i in 1:K){
//   sx[i]=sqrt(cov[i,1,1]);
//    sy[i]=sqrt(cov[i,2,2]);
//    ro[i]=cov[i,1,2]/sqrt(cov[i,1,1])/sqrt(cov[i,2,2]);
//    }
//}

model {
 real ps[K];
 theta ~ dirichlet(rep_vector(2.0, D));
 for(k in 1:K){
 mu[1,k] ~ normal(3.67,1);//uniform(340/100,380/100);//
 mu[2,k]~  normal(31.80,1);//uniform(3160/100,3190/100);//
 sigma[k] ~ normal(1,2);
 L[k] ~ lkj_corr_cholesky(2);
 }
 

 for (n in 1:N){
 for (k in 1:K){
 ps[k] = log(theta[k])+multi_normal_cholesky_lpdf(y[n] | mu[k], L[k]); //increment log probability of the gaussian
 }
 target += log_sum_exp(ps);
 }
}

'''
Result:
             mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
theta[1]     0.85  1.6e-3    0.1   0.62   0.79   0.87   0.92   0.98 3731.0    1.0
theta[2]     0.15  1.6e-3    0.1   0.02   0.08   0.13   0.21   0.38 3731.0    1.0
mu[1,1]      1.14  9.2e-3   0.34   0.55   0.93   1.13   1.34   1.75 1345.0    1.0
mu[2,1]     31.22    0.01   0.85  29.51  30.67  31.23  31.81  32.83 3741.0    1.0
mu[1,2]     29.19  4.1e-3    0.3  28.61  28.99   29.2   29.4  29.78 5352.0    1.0
mu[2,2]     32.38    0.01   0.81  30.78  31.85  32.38  32.92  33.99 5504.0    1.0
L[1,1,1]      1.0     nan    0.0    1.0    1.0    1.0    1.0    1.0    nan    nan
L[2,1,1]      1.0     nan    0.0    1.0    1.0    1.0    1.0    1.0    nan    nan
L[1,2,1]     0.98  3.2e-3   0.06   0.93   0.99   0.99    1.0    1.0  358.0    1.0
L[2,2,1]  -4.0e-3  5.7e-3   0.45  -0.81  -0.35 5.9e-4   0.33   0.82 6045.0    1.0
L[1,1,2]      0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
L[2,1,2]      0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
L[1,2,2]     0.13  4.5e-3   0.09   0.06   0.09   0.11   0.15   0.36  433.0   1.01
L[2,2,2]     0.88  3.7e-3   0.14   0.49   0.83   0.94   0.99    1.0 1415.0    1.0
sigma[1,1]   2.01    0.03   1.41    0.1   0.89    1.8   2.88   5.36 2978.0    1.0
sigma[2,1]   2.01    0.02   1.39   0.11   0.89   1.76   2.87   5.24 3984.0    1.0
sigma[1,2]   1.99    0.03   1.45   0.07   0.83   1.75   2.89   5.35 3098.0    1.0
sigma[2,2]   2.03    0.03   1.43   0.08   0.89   1.81   2.91   5.42 3054.0    1.0
cov[1,1,1]   2.01    0.03   1.41    0.1   0.89    1.8   2.88   5.36 2978.0    1.0
cov[2,1,1]   2.01    0.02   1.39   0.11   0.89   1.76   2.87   5.24 3984.0    1.0
cov[1,2,1]   1.96    0.03   1.43   0.06    0.8   1.72   2.84   5.27 2911.0    1.0
cov[2,2,1]  -0.01    0.02   1.11  -2.58   -0.5 2.1e-4   0.47   2.43 4070.0    1.0
cov[1,1,2]    0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
cov[2,1,2]    0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
cov[1,2,2]   0.26    0.01    0.3 8.1e-3   0.09    0.2   0.35   0.91  903.0    1.0
cov[2,2,2]   1.79    0.02   1.31   0.08   0.75   1.56   2.57   4.92 2908.0    1.0
lp__       -371.9    0.09   2.82 -378.4 -373.5 -371.5 -369.8 -367.5 1069.0    1.0

bgoodri · November 19, 2018, 4:19pm

The marginal distribution of each correlation if \eta = 1 (the shape parameter) is something like Beta \left(\frac{K + 1}{2},\frac{K+1}{2}\right) over the (-1,1) interval. But the marginal prior distribution is not a useful quantity when there is an inequality constraint (positive definiteness) that depends on all the correlations.

Topic		Replies	Views
Underestimating correlation coefficients with LKJ prior Modeling	24	6644	September 15, 2020
Question about LKJ normalizing constant Modeling	21	2191	February 5, 2018
Learning a correlation matrix Modeling	8	3285	December 13, 2017
Constraints on LKJ prior Modeling	13	1132	May 15, 2024
Confusion about lkj prior in mixture modeling example Modeling	3	1163	August 30, 2017

LKJ prior distribution

Related topics