Impute Missing Data/ Data Augmentation in Stans

Hey,

I am quite new to Stan and encounter some problems fitting the following model:

D_{i}\sim logistic(tinter+\theta^{T} X_{i}+\eta_{j},1),\eta_{j}\sim N(0,\tau^{2})

T_{i}=\textbf{1}_{\lambda^{T} Z_{i}\geq D_{i}}.

Y_{i}(t)\sim \text{ordered logistic}(\alpha_{t}+\beta_{t}^{T}X_{i}+\gamma_{t}D_{i}+u_{tj},c),t=0,1,a_{0}=0

u_{tj}\sim N(0,\tau_{t}^{2})

Y_{i}=Y_{i}(1)T_{i}+Y_{i}(0)(1-T_{i}),i=1,2,\cdots,N,j=1,2\cdots,C

The data we observed are (X_{i},T_{i},Z_{i},Y_{i}). Basically, Z_{i} and unobserved D_{i} (we can view it as a threshold) determines the value of T_{i}=0,1. Also the outcome is related to unobserved quantities D_{i}. Also, we have clustering issue so I add random effect \eta_{j} (for T_{i} process), u_{tj} (for Y_{i}(t)).

The problem is almost every transition returns a divergent warning even if I set a high acceptance rate. However, the divergence disappears when I use the real D_{i} (in simulated data). I guess the problem is we need to impute D_{i} for each observation but now I treat it as a parameter. I guess Stan put a default uniform prior on it, which is unwanted. I suppose this model is similar to data augmentation or EM algorithm, in which we draw a missing value (D_{i}) and maximize the log-likelihood iteratively. Is there any way I can apply Stan to accomplish it?

I enclose the raw code FYI, thanks in advance!


data {
int<lower=2> K; //Number of Levels,K=4,5
int<lower=0> N; //Number of Observations
int<lower=1> D; //Dimension of Covariates
int<lower=1> S; //Dimension of IV Splines
int <lower=1> C; //Number of Cluster

int<lower=1,upper=K> yobs[N]; //Response
row_vector[D] X[N];   //Covariates Vector
int<lower=0,upper=1> t[N]; //Treatment Indicator
row_vector[S] Z[N];  // IV Splines Value Vector
int<lower=1,upper=Cluster> G[N]; //Cluster Membership

//Prior Parameters
vector[D] prior_mean;
matrix[D,D] prior_cov;
vector[S] prior_IV_mean;
matrix[S,S] prior_IV_cov;
vector[K] prior_frequency;// Prior for Dirichlet Distribution
//Prior SD for gamma0,gamma1
real<lower=0> vg;
//Prior Scale of Logistic Distribution, default is 1
real<lower=0> logu;
}

parameters {
vector[D] beta0;   //Coefficients for Covariates of Control Response
vector[D] beta1;   //Coefficients for Covariates of Treated Response
vector[D] theta;   //Coefficients for Covariates for Treatment Assignment 
vector[S] lambda; //Coefficients for IV Splines Basis

simplex [K] p; //Probability in each cell for ordinal regression

vector[N] d;       //Threshold

real tinter;  //Intercept in Threshold Model
real d1;      //Intercept in Treated Response \alpha_{1}
real<lower=0> tau;    //Variance for RE (Random Effect) in Treatment Assignment
real<lower=0> tau0;   //Variance for RE in Control
real<lower=0> tau1;   //Variance for RE in Treated

real gamma0;  //Coefficients for Threshold in Control
real gamma1;  //Coefficients for Threshold in Treated

//Non center Random Effect
real et_tilde[C];
real u0_tilde[C];
real u1_tilde[C];


}

transformed parameters{
  //Random Effect
vector[C] et; // Re in Treated Assignment
vector[C] u0; // Re in Control Response
vector[C] u1; // Re in Treated Reponse
ordered[K-1] c;    //Cut Points for Ordinal
real sump;

//Calculate Cut Points Based on probability for each cell
c[1]=logit(p[1]);
sump=p[1];
for (k in 2:K-1)
{
 sump=sump+p[k];
 c[k]=logit(sump);
}
//Generating Cluster Random Effect
for (j in 1:C)
{
et[j] = tau*et_tilde[j];  //Random Effect in Threshold
u0[j] = tau0*u0_tilde[j];   //Random Effect in Control Response
u1[j] = tau1*u1_tilde[j];  //Random Effect in Treated Response
}
}

model {
//Prior for Variance of Random Effect
tau~cauchy(0, 5);
tau0~cauchy(0, 5);
tau1~cauchy(0, 5);

//Draw unscaled random effect
et_tilde~normal(0,1);
u0_tilde~normal(0,1);
u1_tilde~normal(0,1);

//Uniform Prior for Intercept,d1,tinter

//Normal Weak Informative Prior for Coefficients
gamma0~ normal(0,vg);
gamma1~ normal(0,vg);
lambda~multi_normal(prior_IV_mean,prior_IV_cov);
theta~multi_normal(prior_mean,prior_cov);
beta0~multi_normal(prior_mean,prior_cov);
beta1~multi_normal(prior_mean,prior_cov);
//Dirichilet Prior for prior probability
p~dirichlet(prior_frequency);

for (n in 1:N)
{

if(t[n]==0)
{
//T_{i} determination process
d[n]~logistic(X[n]*theta+tinter+et[G[n]],1)T[Z[n]*lambda,];
//Outcome
yobs[n] ~ ordered_logistic(X[n] * beta0+d[n]*gamma0+u0[G[n]], c);
//ymiss[n] ~ ordered_logistic(d1+X[n] * beta1+d[n]*gamma1+u1[G[n]], c);
}

else{
//T_{i} determination process
d[n]~logistic(X[n]*theta+tinter+et[G[n]],1)T[,Z[n]*lambda];
//Outcome 
yobs[n] ~ ordered_logistic(d1+X[n] * beta1+d[n]*gamma1+u1[G[n]],c);
//ymiss[n] ~ ordered_logistic(d1+X[n] * beta0+d[n]*gamma1+u1[G[n]],c);
}

}
}

This such as this can be done, although they are not easy. You would probably be better off starting with a simpler example, such as a special case of the model that you ultimately want to estimate. There is an example of the latent variable formulation of a Bernoulli model (albeit in the probit) case at

Sure, thanks for your reply. I will look over the example.