Ordinal data with ordered probit and hurdle

I am attempting to model an ordinal pain rating scale in Stan using an ordered probit hurdle model. The pain scale goes from 0 to 10. Most people will rate their pain at a 0, no pain at all. However, if the person is in pain, then I expect the ratings 1 through 10 reflect cut points along a normally-distributed latent variable. There is a section on hurdle Poisson models for count data in the Stan Manual, but I could not find a completely worked example of a hurdle ordered probit model.

My Stan model looks something like this:

data {
    int<lower=0> n_obs;
    int<lower=1> K; // K = 10
    array[n_obs] int<lower=0, upper=K> y;
}
parameters {
    ordered[K - 1] cutpoints;
    real eta_hurdle;
    real theta;
}
model {
    for (i_obs in 1:n_obs) {
        if ( y[i_obs] == 0 ) {
            target += std_normal_lccdf(eta_hurdle); // bernoulli_lpmf(0 | eta_hurdle)
        } else {
            target += std_normal_lcdf(eta_hurdle); // bernoulli_lpmf(1 | eta_hurdle)
            target += ordered_probit_lpmf(y[i_obs] | theta, cutpoints);
        }
    }
}

Is this correct? Specifically, in the else branch where the hurdle has been cleared, must I somehow truncate the ordered probit to exclude the hurdle?

There are two common ways to deal with discrete distributions with excess zeros. The hurdle model is a simple mixture between a distribution always generating 0 and one generating values greater than zero. The zero-inflation model adds probability of generating 0 to an existing probability model. The User’s Guide goes through the Poisson case in detail (scroll down to the hurdle/inflation section or select it on top right menu—it won’t let me link directly):

There you’ll see that the basic form of the hurdle model which is a bernoulli probablity of generating 0 and then the complement of that probability plus a probability of generating a non-zero result, which is your usual ordinal model minus the probability of that returning 0 as an outcome.

For the usual hurdle model, that would look as follows.

parameters { 
   real<lower=0, upper=1> lambda; // probability of 0

for (n in 1:n_obs) {
  if (y[n] == 0) {
    target += bernoulli_lpmf(1 | lambda);
  } else {
    target += bernoulli_lpmf(0 | lambda) 
           + ordered_probit_lpmf(y[i_obs] | theta, cutpoints)
           - log_exp_diff(0, ordered_probit_lpmf(0 | theta, cutpoints)));
}

Note that log_exp_diff(u, v) = log(exp(u) - exp(v)), so that log_exp_diff(0, lp) = log1m(exp(v)), but it’s more stable coded as above.

If you can use ordinal_logit instead of probit, it’s much more efficient because of the form of the cdf.

2 Likes

Thanks for the quick reply! I’m a little confused, ordered_probit_lpmf() doesn’t accept a value of zero. How can I divide by the probability (subtract the log probability) that doesn’t exist?

Edit

Perhaps the idea is to add an additional cutpoint for the hurdle (zero condition) and then shift all the y up by 1 so that y_shifted == 1 corresponds to the hurdle (y_orig == 0)? I guess I’m just struggling, conceptually, to understand the purpose of the added cut point for zero?

data {
    int<lower=0> n_obs;
    int<lower=1> K; // K = 10
    array[n_obs] int<lower=0, upper=K> y;
}
parameters {
    ordered[K] cutpoints;
    real lambda;
    real theta;
}
model {
    for (i_obs in 1:n_obs) {
        if ( y[i_obs] == 0 ) {
            target += bernoulli_lpmf(0 | lambda);
        } else {
            target += bernoulli_lpmf(1 | lambda)
                   + ordered_probit_lpmf(y[i_obs] + 1 | theta, cutpoints)
                   - log1m(ordered_probit_lpmf(1 | theta, cutpoints))
        }
    }
}

It looks fine to me. You are only estimating K-1 cutpoints, so that ensures the probability of all values 1 to 10 sum to 1. I assume the intention is to add predictors of one or both of eta_hurdle and theta. If not, you can just scale y up by one and estimate K cutpoints. The hurdle component wouldn’t be necessary in that case.

I think the log_exp_diff line is clearer in the poisson example. In that case, you need to rescale the poisson distribution such that it excludes 0 because the model only uses one parameter, \lambda. In your case, the additional cutpoint and eta_hurdle would be directly counter-acting each other, so it would be redundant. In other words, you wouldn’t want to freely estimate K+1 probabilities just to zero-out 1 of them.

1 Like

Sorry—I’m spending all my time in Python these days and keep forgetting we index from 1 in Stan for categorical outcomes. Just replace the 0 with a 1.

That’s a really good point! It will only identify the outpoint through the prior, so it’s best to just get rid of it.

With zero inflation instead of the hurdle, you would use all the cutpoints.

1 Like

Thanks! Just to make sure I understand, is the model I provided at the top of the thread, without truncation, a valid hurdle model?

Yes, it is a valid hurdle model, where

Pr(y|\eta, \theta, \tau) = \begin{cases} 1 - \Phi(\eta) & \text{if } y = 0 \\ \Phi(\eta) \times oprobit(y | \theta, \tau) & \text{if } y \geq 1 \end{cases}

As a side note: I suspect there isn’t much discussion about a hurdle ordinal model is because similar models would be specified either as a partial proportional odds model or even an adjacent category model. Not that it’s necessary for your model to fall into one of those categories, but they might be helpful as reference points.

1 Like