Zero-Inflated Ordered Probit with Correlated Errors

Dear all,

Now I am starting a new thread on a second challenge I am determined to overcome this summer: zero-inflated ordered probit with correlated disturbances/errors. The original cite is at

See Eq.s 10 and 11 on p. 1077.

I checked the Stan page on zero-inflated and hurdle and it doesn’t seem to accommodate correlated errors.

https://mc-stan.org/docs/stan-users-guide/finite-mixtures.html

Anyone knows any template codes for this model? Any help would be greatly appreciated. I am not sure if anyone would read through parts of the equations/likelihoods in the paper. There should be two latent equations, one is

r^*=\mathrm{x}\beta+\varepsilon for the binary probit latent variable equation
and the other is
\tilde{y^*}=\mathrm{z}\gamma+u for the ordered probit latent variable equation.
My guess would be to set up \eta=\mathrm{x}\beta and \psi=\mathrm{z}\gamma and then create an \Omega correlation matrix so that \eta and \psi (transformed parameters) are correlated like how we did in multilevel models with a covariance matrix (my previous thread on varying intercept and vary slopes with covariance?

Jun Xu, PhD
Professor
Department of Sociology
Data Science and Analytics Program
Ball State University

We now extend the model to have \epsilon, u follow a bivariate normal distribution with correlation coefficient r, whilst maintaining the identifying assumption of unit variances.

Could be handled by something like

parameters {
...
vector[2] Z; //(e,u)
real<lower=0> rho;
...
}

model{
...
matrix[2,2] Sigma;
Sigma[1,1] = 1;
Sigma[1,2] = rho;
Sigma[2,1] = rho;
Sigma[2,2] = 1;
Z ~ multi_normal(rep_vector(0,2), Sigma);
...
}

Dear Jon,

Thanks a lot for providing great pointers for initiating this coding process. Looking at your codes, I am wondering how to bring in \mathrm{x}\beta and \mathrm{z}\gamma into the model block. Actually, I was thinking about creating \tilde{y^*} and r^* as parameters, and then create an \Omega matrix for them. But again, this doesn’t make much sense since there are too many parameters to estimate. There is clearly an identification issue; I have more parameters than sample size. I think I am missing something here. I drafted some code blocks, but obviously they don’t make sense. I need to spend some good time on this.

Jun

My understanding is that x\beta and z\gamma are just the covariates (in the original paper, demographic and socioeconomic variables if I’m reading it right) and corresponding parameters. So, you if you have N subjects with each with D measured covariates you could add:

data{
...
array[N] row_vector[D] x;
...
}

parameters {
...
vector[D] beta;
...
}

}

and inside your model block you put x[i]*beta inside the appropriate part of the model definition. Repeat for z\gamma.

Have you looked at the Regression Models section of the manual? It won’t exactly solve your problem because the code needs to be modified for the zero inflation and the correlation portion but it might help with coding the ordered probit part (but will be more complicated than the manual due to the correlation)

Looking a bit more closely at the paper, it seems like equation 11 can be directly translated into Stan code following the convention in the ordered probit example (with the usual difficulty in converting between the two different indexing approaches :) ). I think the \mu_j in the paper is the cutpoint c in the Stan examples.

However, Stan does not have a multi_normal_cdf function – you would have to code what the authors call “the standardised bivariate normal distribution with correlation coefficient l between the two univariate random elements.” yourself in the functions block e.g. Estimating the cumulative probability of a bivariate normal distribution - Cross Validated

Usually, the best thing to do is to think of the model generatively and just write your correlated error model. For example, if you have

r_n = x_n \cdot \beta + \epsilon_n,

you can replace an independent error model where \epsilon_n \sim \textrm{normal}(0, \sigma), for example, with one with correlations, where for instance, the \epsilon_n follow a time-series model or a spatial model or a spatio-temporal model or whatever else might make sense here for the joint distribution on \epsilon. For instance, a first-order random walk prior would take

\epsilon_1 \sim \ \ldots

where \ldots is replaced with some distribution (usually taken to be quite wide if you don’t have info on starting position), and

\epsilon_{n + 1} \sim \textrm{normal}(\epsilon_n, \sigma).

I would only use a correlation matrix as a last resort, as it’s inefficient, doesn’t scale well, and is harder to control for the actual sources of correlation.

The zero inflation will be independent unless you’re also modeling the zero inflation. Then you might have bivariate errors and want to model the correlation between them rather than across the different n as I suggested above. We talk about how to do that in the regression chapter of the User’s Guide.

Also, you might also want to switch to ordinal logit, which is a lot more efficient and only varies a bit in the scaling and tails. This is what German et al. recommend in their regression books.

1 Like

Dear Jon,

Thanks for your suggestions. My challenge is how to link the error terms (\varepsilon and u) with \mathrm{x}\beta and \mathrm{z}\gamma, since the bivariate normal serves as a constraint to estimate \beta and \gamma.

Dear Bob,

Thanks a lot for your very helpful recommendations and suggestions. The challenge is that both r^* and y^* are unobserved latent variables. Usually, without the error correlation, the model can be expressed as a likelihood/probability based on the distribution of \varepsilon and u here in a straightforward manner. I want to work out this example to serve my research and teaching purposes since having a correlation matrix is a more generic model, although the off-diagonal elements may well be constrained to zero empirically. While writing these responses, I feel I may have to write out a Stan function to set up the likelihood? While examining the discussion in Stan’s user guide, it appears that my intuition may be right that I may have to turn r^* and y^* into parameters (see the last three Stan code blocks under the section of Multivariate Probit Regression).

https://mc-stan.org/docs/stan-users-guide/regression.html#multivariate-outcomes

Also, I have another question that may deserve a separate thread. I am kind of curious about using array function instead of specific array data types, such as real, vector, and matrix. For example why sometimes one uses array[M, N] real a and other times matrix[M, N] a? What’s the main difference between real a[N] and vector[N] a. Is it just about computational convenience/compatibility? So for a matrix, we can apply matrix multiplication, whereas array can’t? A vector can also involve in vector/matrix operations, whereas a real numbered data array (real a[N]) can’t since it’s just a collection of numbers, not matrix. But certainly real a[N] can be used in loop or element-wise operations?

Jun

Now after working on this model on and off for a while, I came up with the following Stan codes for a zero-inflated ordered probit without correlated errors,

data {
  int<lower=2> K; // y levels (all levels, including zero)
  int<lower=0> N; // number of cases
  int<lower=1> D; // number of x predictors
  int<lower=1> H; // ADD: number of z predictors
  int<lower=0> y[N]; // y vector
  row_vector[D] x[N]; // x matrix
  row_vector[H] z[N]; // ADD: z matrix for infl mat
}
parameters {
  vector[D] beta; // slopes
  vector[H] gamma; // ADD: slopes for z infl mat
  real gamma0; // ADD: intercept for z infl mat
  ordered[K-2] cut; // thresholds
}
model {
  vector[K] theta;
  for (n in 1:N) {
    real psi;
    psi = gamma0 + z[n] * gamma; 
    real eta;
    eta = x[n] * beta;
    // y = 0, recoded to 1
    theta[1] = (1 - Phi(psi)) + Phi(psi) * Phi(-1 * eta);
    // y = 1 cut[0] = 0 based on Harris and Zhao
    theta[2] = Phi(psi) * (Phi(cut[1] - eta) - Phi(0 - eta));
    theta[3] = Phi(psi) * (Phi(cut[2] - eta) - Phi(cut[1] - eta));
    theta[4] = Phi(psi) * (Phi(cut[3] - eta) - Phi(cut[2] - eta));
    theta[5] = Phi(psi) * (1- Phi(cut[3] - eta));
    // the categorical function only take in y = 1, 2, 3....,N (integers >= 1)
    y[n] ~ categorical(theta);
  }
}

So this is a model for a five-level ordinal response variable. Here is the caveat; the zero value has to be changed, so the original coding for this y is 0, 1, 2, 3, 4. I changed it to 1, 2, 3, 4, 5 for the Stan codes to run through. It appears that the categorical() function only accepts integer values from 1 and onward. Is that accurate? It appears that this model takes two-three days to converge with my data. Of course, I can use loop to simplify theta[3] and theta[4]. Still, I don’t have good ideas about the zero-inflated ordered probit model with correlated errors. Any pointers or suggestions?

Jun

You’ll find Stan to be faster if you don’t use a correlation matrix with zero off-diagonal elements. It winds up being cubic in cost as opposed to linear when you do y ~ multi_normal(zeros, diag_matrix(sigma)) instead of y ~ normal(0, sigma).

To speed up your code, it helps to not do repeated expensive computations. For example, do this:

real Phi_psi = Phi(psi);

and then use Phi_psi where you have Phi(psi). We prefer logistic versions because the cdf is so much more efficient than with profit (it’s just inverse logit). And do the same for all the intermediate values Phi(X - eta). This should help a lot as it’s where most of the computation is at in your code.

Dear Bob,

Thanks a lot for your very useful pointers. I will work through your codes and suggestions and see what happens. I am still learning how to use the Stan language to set up the correlation matrix.

Jun

Jun