Ordinal data with ordered probit and hurdle

benkay86 · January 13, 2026, 4:12pm

I am attempting to model an ordinal pain rating scale in Stan using an ordered probit hurdle model. The pain scale goes from 0 to 10. Most people will rate their pain at a 0, no pain at all. However, if the person is in pain, then I expect the ratings 1 through 10 reflect cut points along a normally-distributed latent variable. There is a section on hurdle Poisson models for count data in the Stan Manual, but I could not find a completely worked example of a hurdle ordered probit model.

My Stan model looks something like this:

data {
    int<lower=0> n_obs;
    int<lower=1> K; // K = 10
    array[n_obs] int<lower=0, upper=K> y;
}
parameters {
    ordered[K - 1] cutpoints;
    real eta_hurdle;
    real theta;
}
model {
    for (i_obs in 1:n_obs) {
        if ( y[i_obs] == 0 ) {
            target += std_normal_lccdf(eta_hurdle); // bernoulli_lpmf(0 | eta_hurdle)
        } else {
            target += std_normal_lcdf(eta_hurdle); // bernoulli_lpmf(1 | eta_hurdle)
            target += ordered_probit_lpmf(y[i_obs] | theta, cutpoints);
        }
    }
}

Is this correct? Specifically, in the else branch where the hurdle has been cleared, must I somehow truncate the ordered probit to exclude the hurdle?

Bob_Carpenter · January 13, 2026, 4:57pm

There are two common ways to deal with discrete distributions with excess zeros. The hurdle model is a simple mixture between a distribution always generating 0 and one generating values greater than zero. The zero-inflation model adds probability of generating 0 to an existing probability model. The User’s Guide goes through the Poisson case in detail (scroll down to the hurdle/inflation section or select it on top right menu—it won’t let me link directly):

There you’ll see that the basic form of the hurdle model which is a bernoulli probablity of generating 0 and then the complement of that probability plus a probability of generating a non-zero result, which is your usual ordinal model minus the probability of that returning 0 as an outcome.

For the usual hurdle model, that would look as follows.

parameters { 
   real<lower=0, upper=1> lambda; // probability of 0

for (n in 1:n_obs) {
  if (y[n] == 0) {
    target += bernoulli_lpmf(1 | lambda);
  } else {
    target += bernoulli_lpmf(0 | lambda) 
           + ordered_probit_lpmf(y[i_obs] | theta, cutpoints)
           - log_exp_diff(0, ordered_probit_lpmf(0 | theta, cutpoints)));
}

Note that log_exp_diff(u, v) = log(exp(u) - exp(v)), so that log_exp_diff(0, lp) = log1m(exp(v)), but it’s more stable coded as above.

If you can use ordinal_logit instead of probit, it’s much more efficient because of the form of the cdf.

benkay86 · January 13, 2026, 6:20pm

Thanks for the quick reply! I’m a little confused, ordered_probit_lpmf() doesn’t accept a value of zero. How can I divide by the probability (subtract the log probability) that doesn’t exist?

Edit

Perhaps the idea is to add an additional cutpoint for the hurdle (zero condition) and then shift all the y up by 1 so that y_shifted == 1 corresponds to the hurdle (y_orig == 0)? I guess I’m just struggling, conceptually, to understand the purpose of the added cut point for zero?

data {
    int<lower=0> n_obs;
    int<lower=1> K; // K = 10
    array[n_obs] int<lower=0, upper=K> y;
}
parameters {
    ordered[K] cutpoints;
    real lambda;
    real theta;
}
model {
    for (i_obs in 1:n_obs) {
        if ( y[i_obs] == 0 ) {
            target += bernoulli_lpmf(0 | lambda);
        } else {
            target += bernoulli_lpmf(1 | lambda)
                   + ordered_probit_lpmf(y[i_obs] + 1 | theta, cutpoints)
                   - log1m(ordered_probit_lpmf(1 | theta, cutpoints))
        }
    }
}

simonbrauer · January 13, 2026, 7:41pm

It looks fine to me. You are only estimating K-1 cutpoints, so that ensures the probability of all values 1 to 10 sum to 1. I assume the intention is to add predictors of one or both of eta_hurdle and theta. If not, you can just scale y up by one and estimate K cutpoints. The hurdle component wouldn’t be necessary in that case.

I think the log_exp_diff line is clearer in the poisson example. In that case, you need to rescale the poisson distribution such that it excludes 0 because the model only uses one parameter, \lambda. In your case, the additional cutpoint and eta_hurdle would be directly counter-acting each other, so it would be redundant. In other words, you wouldn’t want to freely estimate K+1 probabilities just to zero-out 1 of them.

Bob_Carpenter · January 13, 2026, 8:21pm

Sorry—I’m spending all my time in Python these days and keep forgetting we index from 1 in Stan for categorical outcomes. Just replace the 0 with a 1.

That’s a really good point! It will only identify the outpoint through the prior, so it’s best to just get rid of it.

With zero inflation instead of the hurdle, you would use all the cutpoints.

benkay86 · January 13, 2026, 10:48pm

Thanks! Just to make sure I understand, is the model I provided at the top of the thread, without truncation, a valid hurdle model?

simonbrauer · January 14, 2026, 2:34pm

Yes, it is a valid hurdle model, where

Pr(y|\eta, \theta, \tau) = \begin{cases} 1 - \Phi(\eta) & \text{if } y = 0 \\ \Phi(\eta) \times oprobit(y | \theta, \tau) & \text{if } y \geq 1 \end{cases}

As a side note: I suspect there isn’t much discussion about a hurdle ordinal model is because similar models would be specified either as a partial proportional odds model or even an adjacent category model. Not that it’s necessary for your model to fall into one of those categories, but they might be helpful as reference points.

Topic		Replies	Views
Ordinal probit response model (also translation from jags) Modeling	10	2666	March 3, 2022
Hurdle lognormal distribution Modeling mixture	11	2567	October 9, 2020
Comparing Hurdle Model with Non-Hurdle Model Modeling loo	5	550	November 21, 2023
How can I write a hurdle ordinal model custom family in brms? Modeling specification , brms	3	807	January 18, 2023
Hierarchical hurdle model in Stan? Modeling hierarchical-model , mixture	9	990	April 4, 2022

Ordinal data with ordered probit and hurdle

Related topics