I try to fit a simplified latent variable model using Bayesian data augmentation approach as used by Stephen and Toubia (2010).

The basic idea is that dependent variable `y`

and focal independent variable `z`

may be both influenced by a latent variable, `latent`

, and the model explicitly accounts for this. The model is as following,

z=gamma_1 + gamma_2*latent + delta
y=beta_1+beta_2*x_1+beta_3

*x_2+z_beta*delta+latent_beta*latent+epsilon

`latent`

enters both equations for `z`

and `y`

. `x_1`

and `x_2`

are two control variables. `gamma_1`

and `beta_1`

are two intercepts. `delta`

and `epsilon`

are two error terms.

One novelty is that the residual from first equation `delta`

enters the second equation instead of `z`

itself.

In my simulated dataset, I set `beta=c(1,2,3)`

, `gamma=c(4,5)`

, `latent_beta=c(10)`

and `z_beta=c(6)`

. As seen from the result, `betas`

and `gamma_1`

have been successfully recovered, however, `z_beta`

, `gamma_2`

and `latent_beta`

have not.

Specifically, judging from the traceplots, different chains have converged to different values for `z_beta`

, `gamma_2`

and `latent_beta`

(although some chains of z_beta indeed converged to 6, the value I want).

So my question is how to deal with this divergence issue? Is it anything wrong with my code (I haven’t specified prior yet, I know. It is not likely to blame this right?)?

Here are the codes and traceplots.

#simulating data set

n.obs=500

#construct x

x.beta=c(1,2,3) #first is intercept

n.xvar=length(x.beta)

x=matrix(0,ncol=n.xvar,nrow=n.obs)

x[,1]=1

for(i in 2:n.xvar){

x[,i]=rnorm(n.obs,0,1)

}

#about latent ability

latent=rnorm(n.obs,0,1)

latent.beta=c(10) #latent_beta for y

##about z

#z for latent ability

z.delta=rnorm(n.obs,0,1) #error term for z

z.gamma=c(4,5) #parameter for z of latent variable

z.iv=cbind(1,latent)

n_z_l_var=length(z.gamma)

z=z.iv%*%z.gamma+z.delta

z=as.vector(z)

#z for y

z.beta=c(6)

n_z_y_var=length(z.beta)

#construct y

epsilon=rnorm(n.obs,0,1) #error term for y

y=x %% x.beta+z.deltaz.beta+latent*latent.beta+epsilon

y=as.vector(y)

data=list(n=n.obs,

n_xvar=n.xvar,

n_z_l_var=n_z_l_var,

n_z_y_var=n_z_y_var,

y=y,

x=x,

z=z)

#model

model_string="

data{

int<lower=0> n;

int<lower=0> n_xvar;

int<lower=0> n_z_l_var;

int<lower=0> n_z_y_var;

vector[n] y;

matrix[n,n_xvar] x;

vector[n] z;

}

parameters{

vector[n_xvar] x_beta;

vector[n_z_l_var] z_gamma;

real z_beta;

real latent_beta;

vector[n] latent;

}

model{

vector[n] z_residual;

z~normal(z_gamma[1]+latentz_gamma[2],1);z_gamma[2]);

z_residual=z-(z_gamma[1]+latent

e

y~normal(xx_beta+z_residualz_beta+latent*latent_beta,1); //likelihood

}"

fit=stan(model_code = model_string,

data=data,

pars=c(“x_beta”,‘z_beta’,“z_gamma”,“latent_beta”),

iter = 10000,warmup = 100)

Result

Inference for Stan model: 106df335f9ff6a8e24de3c8185ee63b5.

4 chains, each with iter=10000; warmup=100; thin=1;

post-warmup draws per chain=9900, total post-warmup draws=39600.

`mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat`

x_beta[1] 1.48 0.10 0.20 1.16 1.31 1.46 1.62 1.90 5 1.61

x_beta[2] 2.06 0.12 0.19 1.76 1.93 2.03 2.22 2.42 2 3.22

x_beta[3] 2.90 0.07 0.12 2.72 2.82 2.88 3.00 3.12 3 1.85

z_beta 3.85 2.06 2.91 -1.29 3.64 5.46 5.59 5.80 2 46.26

z_gamma[1] 4.18 0.03 0.06 4.06 4.13 4.19 4.22 4.27 3 1.59

z_gamma[2] 0.18 0.58 0.81 -0.99 -0.48 0.29 1.07 1.11 2 37.89

latent_beta 0.35 1.18 1.66 -2.05 -1.02 0.53 2.11 2.31 2 28.24

lp__ -556.18 15.96 30.02 -612.09 -579.73 -551.85 -536.13 -496.10 4 2.27

Samples were drawn using NUTS(diag_e) at Sun Jul 2 19:52:42 2017.

For each parameter, n_eff is a crude measure of effective sample size,

and Rhat is the potential scale reduction factor on split chains (at

convergence, Rhat=1).

Traceplot