I try to fit a simplified latent variable model using Bayesian data augmentation approach as used by Stephen and Toubia (2010).
The basic idea is that dependent variable y
and focal independent variable z
may be both influenced by a latent variable, latent
, and the model explicitly accounts for this. The model is as following,
z=gamma_1 + gamma_2latent + delta
y=beta_1+beta_2x_1+beta_3x_2+z_betadelta+latent_beta*latent+epsilon
latent
enters both equations for z
and y
. x_1
and x_2
are two control variables. gamma_1
and beta_1
are two intercepts. delta
and epsilon
are two error terms.
One novelty is that the residual from first equation delta
enters the second equation instead of z
itself.
In my simulated dataset, I set beta=c(1,2,3)
, gamma=c(4,5)
, latent_beta=c(10)
and z_beta=c(6)
. As seen from the result, betas
and gamma_1
have been successfully recovered, however, z_beta
, gamma_2
and latent_beta
have not.
Specifically, judging from the traceplots, different chains have converged to different values for z_beta
, gamma_2
and latent_beta
(although some chains of z_beta indeed converged to 6, the value I want).
So my question is how to deal with this divergence issue? Is it anything wrong with my code (I haven’t specified prior yet, I know. It is not likely to blame this right?)?
Here are the codes and traceplots.
#simulating data set
n.obs=500
#construct x
x.beta=c(1,2,3) #first is intercept
n.xvar=length(x.beta)
x=matrix(0,ncol=n.xvar,nrow=n.obs)
x[,1]=1
for(i in 2:n.xvar){
x[,i]=rnorm(n.obs,0,1)
}
#about latent ability
latent=rnorm(n.obs,0,1)
latent.beta=c(10) #latent_beta for y
##about z
#z for latent ability
z.delta=rnorm(n.obs,0,1) #error term for z
z.gamma=c(4,5) #parameter for z of latent variable
z.iv=cbind(1,latent)
n_z_l_var=length(z.gamma)
z=z.iv%*%z.gamma+z.delta
z=as.vector(z)
#z for y
z.beta=c(6)
n_z_y_var=length(z.beta)
#construct y
epsilon=rnorm(n.obs,0,1) #error term for y
y=x %% x.beta+z.deltaz.beta+latent*latent.beta+epsilon
y=as.vector(y)
data=list(n=n.obs,
n_xvar=n.xvar,
n_z_l_var=n_z_l_var,
n_z_y_var=n_z_y_var,
y=y,
x=x,
z=z)
#model
model_string="
data{
int<lower=0> n;
int<lower=0> n_xvar;
int<lower=0> n_z_l_var;
int<lower=0> n_z_y_var;
vector[n] y;
matrix[n,n_xvar] x;
vector[n] z;
}
parameters{
vector[n_xvar] x_beta;
vector[n_z_l_var] z_gamma;
real z_beta;
real latent_beta;
vector[n] latent;
}
model{
vector[n] z_residual;
z~normal(z_gamma[1]+latentz_gamma[2],1);
z_residual=z-(z_gamma[1]+latentz_gamma[2]);
e
y~normal(xx_beta+z_residualz_beta+latent*latent_beta,1); //likelihood
}"
fit=stan(model_code = model_string,
data=data,
pars=c(“x_beta”,‘z_beta’,“z_gamma”,“latent_beta”),
iter = 10000,warmup = 100)
Result
Inference for Stan model: 106df335f9ff6a8e24de3c8185ee63b5.
4 chains, each with iter=10000; warmup=100; thin=1;
post-warmup draws per chain=9900, total post-warmup draws=39600.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
x_beta[1] 1.48 0.10 0.20 1.16 1.31 1.46 1.62 1.90 5 1.61
x_beta[2] 2.06 0.12 0.19 1.76 1.93 2.03 2.22 2.42 2 3.22
x_beta[3] 2.90 0.07 0.12 2.72 2.82 2.88 3.00 3.12 3 1.85
z_beta 3.85 2.06 2.91 -1.29 3.64 5.46 5.59 5.80 2 46.26
z_gamma[1] 4.18 0.03 0.06 4.06 4.13 4.19 4.22 4.27 3 1.59
z_gamma[2] 0.18 0.58 0.81 -0.99 -0.48 0.29 1.07 1.11 2 37.89
latent_beta 0.35 1.18 1.66 -2.05 -1.02 0.53 2.11 2.31 2 28.24
lp__ -556.18 15.96 30.02 -612.09 -579.73 -551.85 -536.13 -496.10 4 2.27
Samples were drawn using NUTS(diag_e) at Sun Jul 2 19:52:42 2017.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Traceplot