Ordered probit with data augmentation

Hi,

I’m trying to code an ordered probit, which estimates the latent variable. (I don’t want to create it in generated quantities, as I want to use the latent variable as predictor for another variable in my model in the HMC process.)

The Stan code is the following

data{
  int<lower=2> K;  //number of bins 
  int<lower=0> N;
  int<lower=1> D;
  int<lower=1,upper=K> Y[N];
  row_vector[D] X[N];

  }
parameters{
  vector[D] beta_1;
  vector[N] M_cont;   //latent variable
  ordered[K+1-3] c;   //vector of variable cutoffs (3 cutoffs are fixed for identification)
}
model{
  vector[K] theta;
  
  //likelihood
  for (n in 1:N) {
    real eta;
    eta = X[n] * beta_1;
    if(Y[n]==1) M_cont[n] ~ normal(eta, 1) T[-1000,0]; 
    if(Y[n]==2) M_cont[n] ~ normal(eta, 1) T[0,c[1]]; 
    if((2<Y[n])&&(K>Y[n])) M_cont[n] ~ normal(eta, 1) T[c[Y[n]-2],c[Y[n]-1]]; 
    if(Y[n]==K) M_cont[n] ~ normal(eta, 1) T[c[K-2],1000];
  }
}

My data generating process is as per the example in bayesm package in R, and I initialize the sampler as follows:

#Generating Data
simordprobit=function(X, betas, cutoff){
  z = X%*%betas + rnorm(nobs)   
  y = cut(z, br = cutoff, right=TRUE, include.lowest = TRUE, labels = FALSE)  
  return(list(y = y, X = X, k=(length(cutoff)-1), betas= betas, cutoff=cutoff ))
}
nobs=10 
X=cbind(rep(1,nobs),runif(nobs, min=0, max=5),runif(nobs,min=0, max=5))
k=7
betas=c(0.5, 1, -0.5)       
cutoff=c(-100, 0, 1, 2, 3, 4, 5, 100)
simout=simordprobit(X, betas, cutoff)   
Data=list(X=simout$X,y=simout$y, k=k)

#estimation
R=2000
M_cont_init=ifelse(Data$y==1, -.3,ifelse(Data$y==2, .05,ifelse(Data$y==3,.15,ifelse(Data$y==4,.25,ifelse(Data$y==5,.35,ifelse(Data$y==6,.45,.55))))))
init_f <- function() { list(beta_1= rep(0,3),c=c(.1,.2,.3,.4,.5),M_cont=M_cont_init); }
stanfit_ordprobit = stan(file="Ordprobit_Aug.stan",
                              data = list(K = k, N = nobs, D = dim(Data$X)[2], Y = Data$y, X = Data$X),
                              pars = c("beta_1","c","M_cont"),chains = 1,iter = R,warmup = 100, init = init_f)
                              

However I get the error message:

Rejecting initial value:
** Gradient evaluated at the initial value is not finite.**
** Stan can’t start sampling from this initial value.**
[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."

I would appreciate if you could help me pull this off without facing gradient problems!

Thank you!

I believe the problem is that the values of M_cont are extremely unlikely to fall within the relevant cutpoints and thus the truncated densities return negative infinity.

If you are going to do a data augmentation thing in a Stan program, you should reformulate like this

parameters {
  vector[D] beta_1;
  ordered[K+1-3] c;
  vector<lower=0, upper=1> p;
}
transformed parameters {
  vector[N] M_cont;
  for (n in 1:N) {
    if (Y[n] == 1) M_cont[n] = -1000 + p[n] * (0 + 1000);
    else if (Y[n] == 2) M_cont[n] = 0 + p[n] * (c[1] - 0);
    else if (Y[n] == K) M_cont[n] = c[K - 2] + p[n] * (1000 - c[K - 2]);
    else M_cont[n] = c[Y[n]-2] + p[n] * (c[Y[n]-1] - c[Y[n]-2]);
  }
}

Then your model should initialize, but you need a Jacobian correction, which are the logarithms of the term in parentheses that p[n] is multiplied by.