# Rejecting initial value error in fitting an ordered Probit model with 100 ordinal categories as dependent variables

Hi, I am fitting an ordered probit model.
In case of 50 ordinal categories as DV, my model successfully recovers true parameters.
However, in 100 ordinal category case, the model causes an error, saying

Chain 1: Rejecting initial value:
Chain 1: Log probability evaluates to log(0), i.e. negative infinity.
Chain 1: Stan can’t start sampling from this initial value.

The model can’t recover the true parameters even when I use the true values as initial values.
init=0 does not work also.
(By polr function from MASS package, the true parameters are recovered in 100 ordinal category case.)

How can I treat this situation?
Thank you.

data {
int<lower=2> K;
int<lower=0> N;
int<lower=1> D;
int<lower=1,upper=K> y[N];
row_vector[D] x[N];
}
parameters {
vector[D] beta;
ordered[K-1] c;
}
model {
for (n in 1:N)
y[n] ~ ordered_probit(x[n] * beta, c);
}


My R code for data generation is here.

ntime=1000
nanswer=100 #### Number of ordinal categories

b0=0
b1=0.5
b2=-3
X1=runif(n,0,1)
X2=runif(n,0,1)
X=cbind(X1,X2)
sd.error=1

tau=matrix(0,n,1)
prob=seq(0.01,0.99,by=0.01)  ## prob of each 100 ordinal category
#prob=seq(0.01,0.98,by=0.02) ## prob of each 50 ordinal category

Y=rep(NA,n)
XB=rep(NA,n)

XB=b0+b1*X1+b2*X2

Y.star<-rnorm(n,XB,sd.error)

tau=qnorm(prob,mean=mean(XB),sd=sqrt(var(XB)+sd.error^2)) # threshold parameters

Y[Y.star<tau[1]]<-1

for(i in 2:(nanswer-1) ){

Y[Y.star>=tau[i-1] & Y.star<tau[i]]<-i

}

#library(dplyr)
#Y%>%as_tibble()%>%count(value)%>%print(n=100)

summary(MASS::polr(as.factor(Y) ~ X1 + X2,method="probit")) ## Estimation of true parameters by polr function


To include mathematical notation in your post put LaTeX syntax between two \$ symbols, e.g.,
p(\theta | y) \propto p(\theta) p(y | \theta).

I am pushing well beyond my novice-level comfort with Stan in trying to diagnose this but I think the problem is due to how Stan builds up an ordered vector if no initial values are provided. Looking at section 10.6 of the Stan 2.21 Reference manual, the unconstrained values of the ordered vector, y, are given random values on [-2, 2]. In the constrained vector, x, these values are given by x_1 = y_1 and, when k > 1, x_k = x_{k-1} + exp(y_k) . If you try simulating that for 99 values you will see that the values get to be near 200 near the end of the series. If that is correct, then some of the values of c in the model will be similar.
When calculating the ordered probit, there will be evaluations of Phi((x_n * beta) - c_k). If c_k is 100-ish then this will under flow to 0 according to the Function Reference on the normal_lcdf() function because it supports calculations over [-37.5, 8.25].
That is my very tentative understanding of what is happening. I may be totally wrong. In any case, I used your code to make some data and I ran the following fit successfully.

data=list(y=as.vector(Y),x=X,K=nanswer,N=n,D=2)
INITS <- list(list(c = seq(-7, 7, length.out = 99)))

MODEL <- "
data {
int<lower=2> K;
int<lower=0> N;
int<lower=1> D;
int<lower=1,upper=K> y[N];
row_vector[D] x[N];

}
parameters {
vector[D] beta;
ordered[K-1] c;
}
model {

for (n in 1:N)
y[n] ~ ordered_probit(x[n] * beta, c);
}
"
FIT <- stan(model_code = MODEL, data = data, chains = 1, init = INITS)


It took about 1.5 hours to run and successfully recovered the values of beta. The cut points differ by no more than 0.0015 from those returned by MASS::polr. If I try to run more than one chain my computer bogs down horribly. I hope that is just a resource problem for my little laptop. I cannot justify the initial values; I just made them span a range somewhat larger than what I knew the c values would be.

Hi FJCC,

Sorry for my late response. Following your initial values and varying them to a certain extent, true parameters are recovered. It takes only 1~2 mins if I use 8 cores.

Thank you so much for your solution and kind explanation.