Hi,
I am trying to fit a latent variable model for which I want to learn a correlation matrix.
I followed the advice on Stan manual in chapter 9.15 and targeted the cholesky factor of the correlation matrix. However, the histogram of the posterior samples are very wide and centered around the wrong value. Specifically, in my example I am trying to learn just 1 correlation coefficient. The real coefficient I use to generate the data is 0.6 but the posterior sample mean is 0.
I would greatly appreciate your help in improving this model. I am suspecting I have a mistake somewhere. Is the LKJ prior pulling the posterior towards 0 too much?
Here is my model.stan file:
data{
int<lower=0> N;
int<lower=0> K;
int<lower=0, upper=1> y[N,K];
}
parameters {
vector[K] mu;
vector[K] z[N];
cholesky_factor_corr[K] L_R;
}
model{
L_R ~ lkj_corr_cholesky(4);
mu ~ normal(0,1);
z ~ multi_normal_cholesky(mu, L_R);
for (k in 1:K) y[,k] ~ bernoulli_logit(z[,k]);
}
generated quantities{
corr_matrix[K] R;
R = multiply_lower_tri_self_transpose(L_R);
}
And here is the R script to generate data and fit the model:
library(gtools)
library(MASS)
set.seed(68234934)
N <-300
K<-2
rho <- 0.6
R <- matrix(c(1, rho, rho, 1),2,2)
mu <- c(0.7,0)
z <- mvrnorm(n = N, mu , R)
p <- inv.logit(z, min = 0, max = 1)
y = sapply(1:N, function(n) rbinom(size=1, 2, prob = p[n,]))
y<- t(y)
stan_rdump(c("N", "K", "y"), file="./model.data.R")
input_data <- read_rdump("./model.data.R")
fit <- stan(file='./model.stan',
control = list('adapt_delta' = 0.8 , 'max_treedepth' = 15),
data= input_data,
seed=4938483)