Bayesian data augmentation with latent variable: convergence issue

DING_LI · July 2, 2017, 1:40pm

I try to fit a simplified latent variable model using Bayesian data augmentation approach as used by Stephen and Toubia (2010).

The basic idea is that dependent variable y and focal independent variable z may be both influenced by a latent variable, latent, and the model explicitly accounts for this. The model is as following,

z=gamma_1 + gamma_2latent + delta
y=beta_1+beta_2x_1+beta_3x_2+z_betadelta+latent_beta*latent+epsilon

latent enters both equations for z and y. x_1 and x_2 are two control variables. gamma_1 and beta_1 are two intercepts. delta and epsilon are two error terms.

One novelty is that the residual from first equation delta enters the second equation instead of z itself.

In my simulated dataset, I set beta=c(1,2,3), gamma=c(4,5), latent_beta=c(10) and z_beta=c(6). As seen from the result, betas and gamma_1 have been successfully recovered, however, z_beta, gamma_2 and latent_beta have not.

Specifically, judging from the traceplots, different chains have converged to different values for z_beta, gamma_2 and latent_beta (although some chains of z_beta indeed converged to 6, the value I want).

So my question is how to deal with this divergence issue? Is it anything wrong with my code (I haven’t specified prior yet, I know. It is not likely to blame this right?)?

Here are the codes and traceplots.

#simulating data set
n.obs=500

#construct x
x.beta=c(1,2,3) #first is intercept
n.xvar=length(x.beta)

x=matrix(0,ncol=n.xvar,nrow=n.obs)
x[,1]=1
for(i in 2:n.xvar){
x[,i]=rnorm(n.obs,0,1)
}

#about latent ability
latent=rnorm(n.obs,0,1)
latent.beta=c(10) #latent_beta for y

##about z
#z for latent ability
z.delta=rnorm(n.obs,0,1) #error term for z
z.gamma=c(4,5) #parameter for z of latent variable
z.iv=cbind(1,latent)
n_z_l_var=length(z.gamma)
z=z.iv%*%z.gamma+z.delta
z=as.vector(z)
#z for y
z.beta=c(6)
n_z_y_var=length(z.beta)

#construct y
epsilon=rnorm(n.obs,0,1) #error term for y
y=x %% x.beta+z.deltaz.beta+latent*latent.beta+epsilon
y=as.vector(y)

data=list(n=n.obs,
n_xvar=n.xvar,
n_z_l_var=n_z_l_var,
n_z_y_var=n_z_y_var,
y=y,
x=x,
z=z)

#model
model_string="
data{
int<lower=0> n;
int<lower=0> n_xvar;
int<lower=0> n_z_l_var;
int<lower=0> n_z_y_var;
vector[n] y;
matrix[n,n_xvar] x;
vector[n] z;
}
parameters{
vector[n_xvar] x_beta;
vector[n_z_l_var] z_gamma;
real z_beta;
real latent_beta;
vector[n] latent;
}
model{
vector[n] z_residual;
z~normal(z_gamma[1]+latentz_gamma[2],1);
z_residual=z-(z_gamma[1]+latentz_gamma[2]);
e
y~normal(xx_beta+z_residualz_beta+latent*latent_beta,1); //likelihood
}"

fit=stan(model_code = model_string,
data=data,
pars=c(“x_beta”,‘z_beta’,“z_gamma”,“latent_beta”),
iter = 10000,warmup = 100)

Result

Inference for Stan model: 106df335f9ff6a8e24de3c8185ee63b5.
4 chains, each with iter=10000; warmup=100; thin=1;
post-warmup draws per chain=9900, total post-warmup draws=39600.

       mean se_mean    sd    2.5%     25%     50%     75%   97.5% n_eff  Rhat
x_beta[1] 1.48 0.10 0.20 1.16 1.31 1.46 1.62 1.90 5 1.61
x_beta[2] 2.06 0.12 0.19 1.76 1.93 2.03 2.22 2.42 2 3.22
x_beta[3] 2.90 0.07 0.12 2.72 2.82 2.88 3.00 3.12 3 1.85
z_beta 3.85 2.06 2.91 -1.29 3.64 5.46 5.59 5.80 2 46.26
z_gamma[1] 4.18 0.03 0.06 4.06 4.13 4.19 4.22 4.27 3 1.59
z_gamma[2] 0.18 0.58 0.81 -0.99 -0.48 0.29 1.07 1.11 2 37.89
latent_beta 0.35 1.18 1.66 -2.05 -1.02 0.53 2.11 2.31 2 28.24
lp__ -556.18 15.96 30.02 -612.09 -579.73 -551.85 -536.13 -496.10 4 2.27

Samples were drawn using NUTS(diag_e) at Sun Jul 2 19:52:42 2017.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

Traceplot

bbbales2 · July 2, 2017, 2:25pm

You should run warmup much longer than 100 iterations. There’s quite a bit that happens in warmup. 100 samples isn’t enough (check the manual for a description – I started typing one out here but I figured I might mess something up).

The default is 1k warmup, 1k samples post-warmup. If you run with those settings, do your chains behave better?

Hope that helps!

DING_LI · July 2, 2017, 3:21pm

Hi, thanks for the suggestion of warmup. It never occured to me that warmup would be the cause.

But unfortunately, after I changed to 1k and 2k warmups, the problem remains. Actually by looking at the traceplots, the chains converge every quickly, within 100 iterations.

bbbales2 · July 2, 2017, 4:27pm

Hmm, cool beans, well the next step is to figure out if the parameterization of your problem actually has multiple equivalent solutions.

You’ll still want to keep a long warmup. Stan will usually move from its initial guess to something close to the solution really quickly, but it still needs a long time to adjust its time step and figure out the mass matrix so during the sampling stage you’re getting nice healthy hopefully-independent samples (edit: again, check the manual on this. I’m probably not describing it very precisely).

Another fishy thing about the traceplots are the little flat areas. You don’t want these. There’s something fishy going on. Since you’re using R, just pop open shinystan (launch_shinystan(fit)) and look around and see if you see anything weird.

Bob_Carpenter · July 2, 2017, 8:41pm

What you’re looking for is pairs (which you can do from RStan itself) to see if there are strong correlations or banan-like or funnel-like shapes.

I indented and spaced the program so I could read it (how do people work with their programs not indented and not spaced? they’re impossible to read!):

data{
  int n;
  int n_xvar;
  int n_z_l_var;
  int n_z_y_var;
  vector[n] y;
  matrix[n,n_xvar] x;
  vector[n] z;
}
parameters{
  vector[n_xvar] x_beta;
  vector[n_z_l_var] z_gamma;
  real z_beta;
  real latent_beta;
  vector[n] latent;
}
model{
  vector[n] z_residual = z - (z_gamma[1]
                              + latent * z_gamma[2]);
  z ~ normal(z_gamma[1]
             + latent * z_gamma[2],
             1);
  y ~ normal(x * x_beta
             + z_residual * z_beta
             + latent * latent_beta,
             1);
}

Also, the thing commented as likelihood isn’t all the likelihood—you need to marginalize out the latent to get the likelihood.

I don’t see why you’re using a residual with a coefficient here, so I’m guessing there’s probably a more generative formulation of this model. I wouldn’t use latent as a variable name—too generic.

the real problem you’re probably running into is the latent * latent_beta term. This can be bad news if they’re not both controlled strongly by a prior as it introduces nasty multiplicative uncertainty (you should see a banana-like relation between latent and each of the latent_beta).

Topic		Replies	Views
Convergence issues in simple latent variable model Modeling fitting-issues	1	849	February 7, 2020
Latent variables, divergent transitions and prior selection Modeling rstan , techniques , fitting-issues	22	1463	September 19, 2021
Why do I find chains running in parallel and not converging General stanc	3	2313	April 16, 2018
Modeling covariates that are functions of latent parameters Modeling fitting-issues , specification	6	1175	July 26, 2017
Issues with convergence for latent variable model in brms Modeling brms	9	1591	June 7, 2022

Bayesian data augmentation with latent variable: convergence issue

Related topics