LKJ prior distribution

Hi
I have a questions on LKJ prior distribution. My model is a mixture gaussian model. So I set LKJ prior as the covariance matrix of a multivariate model distribution. My used prior for sigma is a uniform distribution range from (0.01-0.08), and rho is a uniform distribution range from (-0.999 - 0.999). But the model ‘lkj_corr_cholesky()’ only has one parameter. I set '‘L[k] ~ lkj_corr_cholesky(10)’. How can I tell the model the exact range I want?
Thanks.


model='''
data {
 int D; //number of dimensions
 int K; //number of gaussians
 int N; //number of data
 vector[D] y[N]; //data

}

parameters {
 simplex[K] theta; //mixing proportions
 ordered[D] mu[K]; //mixture component means
 cholesky_factor_corr[D] L[K]; //cholesky factor of covariance
}


model {
 real ps[K];
 theta ~ dirichlet(rep_vector(2.0, D));
 for(k in 1:K){
 mu[1,k] ~ uniform(360/100,375/100);//normal(365/100,2);//
 mu[2,k]~ uniform(3165/100,3180/100);//normal(3170/100,5);//
 L[k] ~ lkj_corr_cholesky(10);
 }
 

 for (n in 1:N){
 for (k in 1:K){
 ps[k] = log(theta[k])+multi_normal_cholesky_lpdf(y[n] | mu[k], L[k]); //increment log probability of the gaussian
 }
 target += log_sum_exp(ps);
 }
}

There is prior density on the whole (-1,1) interval but not very much at the endpoints when D is greater than 2.

1 Like

Could you explain a little bit on how the ‘L[k] ~ lkj_corr_cholesky(10)’ work?
I don’t get what you mean by saying (-1,1) when D is greater than 2.
Thanks.

D is the size of the correlation matrix in your code. Say \eta \geq 1 is the shape parameter for the LKJ distribution, which works a lot like the shape parameter of a symmetric beta distribution, shifted and scaled to the (-1,1) interval. In the case of a beta(10, 10) distribution, it looks like


So, it technically has positive density near 0 and 1, but in practice it is essentially zero past 0.2 and 0.8. It is the same with LKJ(10) in the sense that there is positive density out to -1 and 1, but almost all the prior density is between -0.4 and 0.4.

1 Like

Is LKJ() prior distribution similar as Beta distribution? Do you mean all the parameter in the LKJ matrix is the same range?
Is there a way for me to tell the model more accurate prior distribution? Or do you mean it does not matter assign ‘L[k] ~ lkj_corr_cholesky(10)’ or ‘L[k] ~ lkj_corr_cholesky(5)’?

The LKJ distribution for correlation matrices is basically a generalization of the beta distribution for scalars. By definition, a correlation is between -1 and 1 so the density of the LKJ distribution can be evaluated for any correlation matrix (that is positive definite). There is no generic way in Stan to construct a correlation matrix where the correlations are restricted to a subinterval of the (-1,1) interval.

A LKJ(10) differs from a LKJ(5) in much the same way a beta(10, 10) differs from a beta(5,5), in the sense that a beta(5,5) has less concentration at its mode. In the case of a correlation matrix, the mode is the identity matrix.

thanks for your information.
Then do you know why at present there are several parameters in the L matrix are ‘nan’, but actually they should not be like that. How can I fix that? I put my code and outputs below. Thanks.


model='''
data {
 int D; //number of dimensions
 int K; //number of gaussians
 int N; //number of data
 vector[D] y[N]; //data
}

parameters {
 simplex[K] theta; //mixing proportions
 ordered[D] mu[K]; //mixture component means
 cholesky_factor_corr[D] L[K]; //cholesky factor of covariance
}

transformed parameters{
	cholesky_factor_corr[D] cov[K];  
    real sx[K];
    real sy[K];
    real ro[K];
    cov=L;  
    for (i in 1:K){
    sx[i]=sqrt(cov[i,1,1]);
    sy[i]=sqrt(cov[i,2,2]);
    ro[i]=cov[i,1,2]/sqrt(cov[i,1,1])/sqrt(cov[i,2,2]);
    }
}

model {
 real ps[K];
 theta ~ dirichlet(rep_vector(2.0, D));
 for(k in 1:K){
 mu[1,k] ~ normal(3.67,1);//uniform(340/100,380/100);//
 mu[2,k]~  normal(31.80,1);//uniform(3160/100,3190/100);//
 L[k] ~ lkj_corr_cholesky(4);
 }
 

 for (n in 1:N){
 for (k in 1:K){
 ps[k] = log(theta[k])+multi_normal_cholesky_lpdf(y[n] | mu[k], L[k]); //increment log probability of the gaussian
 }
 target += log_sum_exp(ps);
 }
}

'''
Outputs:
             mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
theta[1]     0.15  1.7e-3    0.1   0.02   0.08   0.14   0.21   0.39 3043.0    1.0
theta[2]     0.85  1.7e-3    0.1   0.61   0.79   0.86   0.92   0.98 3043.0    1.0
mu[1,1]       3.1    0.01   0.81   1.47   2.56   3.11   3.66   4.66 5314.0    1.0
mu[2,1]       6.4  6.7e-3   0.32   5.79   6.18    6.4   6.62   7.03 2222.0    1.0
mu[1,2]      4.23  8.4e-3   0.81   2.72   3.68    4.2   4.76    5.9 9322.0    1.0
mu[2,2]     31.73    0.06   1.27  29.28  30.85  31.76  32.66  34.07  479.0    1.0
L[1,1,1]      1.0     nan    0.0    1.0    1.0    1.0    1.0    1.0    nan    nan
L[2,1,1]      1.0     nan    0.0    1.0    1.0    1.0    1.0    1.0    nan    nan
L[1,2,1]  -4.0e-4  4.2e-3   0.33  -0.63  -0.24 3.6e-3   0.24   0.63 6401.0    1.0
L[2,2,1]  -7.9e-3    0.02   0.48  -0.97  -0.34 5.4e-3   0.35   0.89  430.0    1.0
L[1,1,2]      0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
L[2,1,2]      0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
L[1,2,2]     0.94  1.2e-3   0.08   0.71   0.92   0.97   0.99    1.0 4204.0    1.0
L[2,2,2]     0.85    0.02   0.21   0.22   0.81   0.94   0.99    1.0  147.0   1.01
cov[1,1,1]    1.0     nan    0.0    1.0    1.0    1.0    1.0    1.0    nan    nan
cov[2,1,1]    1.0     nan    0.0    1.0    1.0    1.0    1.0    1.0    nan    nan
cov[1,2,1]-4.0e-4  4.2e-3   0.33  -0.63  -0.24 3.6e-3   0.24   0.63 6401.0    1.0
cov[2,2,1]-7.9e-3    0.02   0.48  -0.97  -0.34 5.4e-3   0.35   0.89  430.0    1.0
cov[1,1,2]    0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
cov[2,1,2]    0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
cov[1,2,2]   0.94  1.2e-3   0.08   0.71   0.92   0.97   0.99    1.0 4204.0    1.0
cov[2,2,2]   0.85    0.02   0.21   0.22   0.81   0.94   0.99    1.0  147.0   1.01
sx[1]         1.0     nan    0.0    1.0    1.0    1.0    1.0    1.0    nan    nan
sx[2]         1.0     nan    0.0    1.0    1.0    1.0    1.0    1.0    nan    nan
sy[1]        0.97  6.6e-4   0.04   0.85   0.96   0.99    1.0    1.0 4208.0    1.0
sy[2]        0.91    0.01   0.14   0.47    0.9   0.97   0.99    1.0  118.0   1.01
ro[1]         0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
ro[2]         0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
lp__       -379.3    0.03   1.91 -383.9 -380.4 -379.0 -377.9 -376.5 3771.0    1.0

Samples were drawn using NUTS at 11/16/18 13:24:23.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

If you mean there are nan in the printed output summary, that is because those are fixed quantities (to either 1 or 0) rather than random variables and hence do not have Rhat, neff, or standard errors for the mean.

Yes, I know that. I am not sure the way I write the code is right or wrong. My understanding of LKJ distribution is combination of sigmax, sigmay and rho. For example, the ‘L’ in my code is the same as co-variance matrix image of 2-D gaussian model. (sx[i] is the sigmax, sy[i] is the sigmay and ro[i] is the correlation factor)
The L[1,1,1] = sx[i]*sx[i]
L[1,1,2] = ro[i]*sx[i]*sy[i]
L[1,2,2] = sy[i]*sy[i]

So then I define sx[i] = sqrt[L[1,1,1]]
Then print the distribution of sx[i]. The result shows it is ‘nan’.

So I wonder whether my definition is wrong? Thanks for your help.

sx[1]         1.0     nan    0.0    1.0    1.0    1.0    1.0    1.0    nan    nan
sx[2]         1.0     nan    0.0    1.0    1.0    1.0    1.0    1.0    nan    nan

It is telling you that sx[1] and sx[2] are always 1 and never themselves NaN, but the standard error of the mean, the neff, and the Rhat are all undefined because those are fixed rather than random variables.

1 Like

So do you think my way of defining those parameters are right?

It is valid syntax, but does not make much sense. This comment is not correct

cholesky_factor_corr[D] L[K]; //cholesky factor of covariance

It is actually an array of Cholesky factors of a correlation matrix. You can create the Cholesky factor of a covariance matrix by pre-multiplying by standard deviations as in

parameters {
 simplex[K] theta; //mixing proportions
 ordered[D] mu[K]; //mixture component means
 cholesky_factor_corr[D] L[K]; //cholesky factor of correlation
 vector<lower=0> sigma[K]; // standard deviations
}
transformed parameters{
  cholesky_factor_corr[D] cov[K];  
  for (k in 1:K) cov[k] = diag_pre_multiply(sigma[k], L[k]);
}

and then in the model block, refer to multi_normal_cholesky_lpdf(y[n] | mu[k], L[k]) and put priors on sigma.

Hi,

After I change the code to the way you suggest, there is one error.


model='''
data {
 int D; //number of dimensions
 int K; //number of gaussians
 int N; //number of data
 vector[D] y[N]; //data
}

parameters {
 simplex[K] theta; //mixing proportions
 ordered[D] mu[K]; //mixture component means
 cholesky_factor_corr[D] L[K]; //cholesky factor of correlation matrix
 vector<lower=0> sigma[k]; // standard deviations
}

transformed parameters{
	cholesky_factor_corr[D] cov[K];  
    for (k in 1:k) cov[k] = diag_pre_multiply(sigma[k],L[k]);
}

model {
 real ps[K];
 theta ~ dirichlet(rep_vector(2.0, D));
 for(k in 1:K){
 mu[1,k] ~ normal(3.67,1);//uniform(340/100,380/100);//
 mu[2,k]~  normal(31.80,1);//uniform(3160/100,3190/100);//
 sigma[k] ~ uniform(0.01,0.89);
 L[k] ~ lkj_corr_cholesky(2);
 }
 

 for (n in 1:N){
 for (k in 1:K){
 ps[k] = log(theta[k])+multi_normal_cholesky_lpdf(y[n] | mu[k], sigma[k],L[k]); //increment log probability of the gaussian
 }
 target += log_sum_exp(ps);
 }
}

'''
The errors are:
error in 'unknown file name' at line 13, column 17
  -------------------------------------------------
    11:  ordered[D] mu[K]; //mixture component means
    12:  cholesky_factor_corr[D] L[K]; //cholesky factor of correlation matrix
    13:  vector<lower=0> sigma[k]; // standard deviations
                        ^
    14: }
  -------------------------------------------------

PARSER EXPECTED: <size declaration: integer (data-only) in square brackets>

[Finished in 8.7s]

That was supposed to be

vector<lower=0>[K] sigma[D]; // standard deviations

Hi,
This time the problem becomes initialization failed. So could it be my prior information is not correct? Or other reasons?
The errors are:


‘’'stan
Rejecting initial value:
Error evaluating the log probability at the initial value.
Exception: anon_model_e64dd13f25b4fc9b418d05db6aed84e9_namespace::write_array: cov[k0__] is not a valid unit vector. The sum of the squares of the elements should be 1, but is 50.5794 (in ‘unknown file name’ at line 17)

Sorry again. The first line in transformed parameters should have been

cholesky_factor_cov[D, D] cov[K];  

Should I change the ‘transformed parameters’ as well? I changed to below way and get an error.


model='''
data {
 int D; //number of dimensions
 int K; //number of gaussians
 int N; //number of data
 vector[D] y[N]; //data
}

parameters {
 simplex[K] theta; //mixing proportions
 ordered[D] mu[K]; //mixture component means
 cholesky_factor_cov[D,D] cov[K]; //cholesky factor of correlation matrix
 vector<lower=0>[D] sigma[K]; // standard deviations
}

transformed parameters{
	cholesky_factor_cov[D,D] cov[K];  
    for (k in 1:K) cov[k] = diag_pre_multiply(sigma[k],L[k]);
}

model {
 real ps[K];
 theta ~ dirichlet(rep_vector(2.0, D));
 for(k in 1:K){
 mu[1,k] ~ normal(3.67,1);//uniform(340/100,380/100);//
 mu[2,k]~  normal(31.80,1);//uniform(3160/100,3190/100);//
 sigma[k] ~ normal(1,2);
 L[k] ~ lkj_corr_cholesky(2);
 }
 

 for (n in 1:N){
 for (k in 1:K){
 ps[k] = log(theta[k])+multi_normal_cholesky_lpdf(y[n] | mu[k], cov[k]); //increment log probability of the gaussian
 }
 target += log_sum_exp(ps);
 }
}

'''

The errors are:
duplicate declaration of variable, name=cov; attempt to redeclare as transformed parameter; original declaration as parameter
Problem with declaration.
  error in 'unknown file name' at line 17, column 35
  -------------------------------------------------
    15: 
    16: transformed parameters{
    17: 	cholesky_factor_cov[D,D] cov[K];  
                                          ^
    18:     for (k in 1:K) cov[k] = diag_pre_multiply(sigma[k],L[k]);
  -------------------------------------------------


[Finished in 9.5s]

The one in the parameters block should be

cholesky_factor_corr[D] L[K]; // cholesky factor of correlation matrix
1 Like

Hi,
Thanks for the help. Your way works! I still carious about the meaning of the parameters. My question is how can I get the distribution of correlation?


model='''
data {
 int D; //number of dimensions
 int K; //number of gaussians
 int N; //number of data
 vector[D] y[N]; //data
}

parameters {
 simplex[K] theta; //mixing proportions
 ordered[D] mu[K]; //mixture component means
 cholesky_factor_corr[D] L[K]; //cholesky factor of correlation matrix
 vector<lower=0>[D] sigma[K]; // standard deviations
}

transformed parameters{
	cholesky_factor_cov[D,D] cov[K];  
    for (k in 1:K) cov[k] = diag_pre_multiply(sigma[k],L[k]);
    }	
//	real sx[K];
//    real sy[K];
//    real ro[K];
//    cov=L;  
//    for (i in 1:K){
//   sx[i]=sqrt(cov[i,1,1]);
//    sy[i]=sqrt(cov[i,2,2]);
//    ro[i]=cov[i,1,2]/sqrt(cov[i,1,1])/sqrt(cov[i,2,2]);
//    }
//}

model {
 real ps[K];
 theta ~ dirichlet(rep_vector(2.0, D));
 for(k in 1:K){
 mu[1,k] ~ normal(3.67,1);//uniform(340/100,380/100);//
 mu[2,k]~  normal(31.80,1);//uniform(3160/100,3190/100);//
 sigma[k] ~ normal(1,2);
 L[k] ~ lkj_corr_cholesky(2);
 }
 

 for (n in 1:N){
 for (k in 1:K){
 ps[k] = log(theta[k])+multi_normal_cholesky_lpdf(y[n] | mu[k], L[k]); //increment log probability of the gaussian
 }
 target += log_sum_exp(ps);
 }
}

'''
Result:
             mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
theta[1]     0.85  1.6e-3    0.1   0.62   0.79   0.87   0.92   0.98 3731.0    1.0
theta[2]     0.15  1.6e-3    0.1   0.02   0.08   0.13   0.21   0.38 3731.0    1.0
mu[1,1]      1.14  9.2e-3   0.34   0.55   0.93   1.13   1.34   1.75 1345.0    1.0
mu[2,1]     31.22    0.01   0.85  29.51  30.67  31.23  31.81  32.83 3741.0    1.0
mu[1,2]     29.19  4.1e-3    0.3  28.61  28.99   29.2   29.4  29.78 5352.0    1.0
mu[2,2]     32.38    0.01   0.81  30.78  31.85  32.38  32.92  33.99 5504.0    1.0
L[1,1,1]      1.0     nan    0.0    1.0    1.0    1.0    1.0    1.0    nan    nan
L[2,1,1]      1.0     nan    0.0    1.0    1.0    1.0    1.0    1.0    nan    nan
L[1,2,1]     0.98  3.2e-3   0.06   0.93   0.99   0.99    1.0    1.0  358.0    1.0
L[2,2,1]  -4.0e-3  5.7e-3   0.45  -0.81  -0.35 5.9e-4   0.33   0.82 6045.0    1.0
L[1,1,2]      0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
L[2,1,2]      0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
L[1,2,2]     0.13  4.5e-3   0.09   0.06   0.09   0.11   0.15   0.36  433.0   1.01
L[2,2,2]     0.88  3.7e-3   0.14   0.49   0.83   0.94   0.99    1.0 1415.0    1.0
sigma[1,1]   2.01    0.03   1.41    0.1   0.89    1.8   2.88   5.36 2978.0    1.0
sigma[2,1]   2.01    0.02   1.39   0.11   0.89   1.76   2.87   5.24 3984.0    1.0
sigma[1,2]   1.99    0.03   1.45   0.07   0.83   1.75   2.89   5.35 3098.0    1.0
sigma[2,2]   2.03    0.03   1.43   0.08   0.89   1.81   2.91   5.42 3054.0    1.0
cov[1,1,1]   2.01    0.03   1.41    0.1   0.89    1.8   2.88   5.36 2978.0    1.0
cov[2,1,1]   2.01    0.02   1.39   0.11   0.89   1.76   2.87   5.24 3984.0    1.0
cov[1,2,1]   1.96    0.03   1.43   0.06    0.8   1.72   2.84   5.27 2911.0    1.0
cov[2,2,1]  -0.01    0.02   1.11  -2.58   -0.5 2.1e-4   0.47   2.43 4070.0    1.0
cov[1,1,2]    0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
cov[2,1,2]    0.0     nan    0.0    0.0    0.0    0.0    0.0    0.0    nan    nan
cov[1,2,2]   0.26    0.01    0.3 8.1e-3   0.09    0.2   0.35   0.91  903.0    1.0
cov[2,2,2]   1.79    0.02   1.31   0.08   0.75   1.56   2.57   4.92 2908.0    1.0
lp__       -371.9    0.09   2.82 -378.4 -373.5 -371.5 -369.8 -367.5 1069.0    1.0

The marginal distribution of each correlation if \eta = 1 (the shape parameter) is something like Beta \left(\frac{K + 1}{2},\frac{K+1}{2}\right) over the (-1,1) interval. But the marginal prior distribution is not a useful quantity when there is an inequality constraint (positive definiteness) that depends on all the correlations.