Proper prior for symmetric ordinal regression thresholds

I am attempting to specify an ordinal regression (cumulative logistic / proportional odds) model. Under the latent variable formulation:

y_i = k if y_i^\star \in (c_{k-1}, c_k)

where y_i^{\star} = \beta x_i + \epsilon_i is the latent variable (with \epsilon_i \sim \mathrm{Logistic}(0,1)) and

-\infty = c_0 < c_1 < \cdots < c_{K-1} < c_K = \infty are the thresholds.

In my situation, I think it is reasonable to assume that the thresholds are symmetric about the middle category. For example, if the number of categories K = 7, then c_1 = c_3 - \delta_2, c_2 = c_3 - \delta_1, c_5 = c_4 + \delta_1, c_6 = c_4 + \delta_2, and we need to estimate c_3 < c_4 and 0 < \delta_1 < \delta_2.

With no constraints on the thresholds except that they are ordered, a proper prior can be obtained by putting a Dirichlet prior on the probabilities of falling into each category when x = 0, i.e.:

transformed parameters {
  ordered[K-1] thresholds = logit(cumulative_sum(catprobs[1:(K-1)]));
}

model {
  catprobs ~ dirichlet(rep_vector(1, K));
  for (n in 1:N_obs) {
    y[n] ~ ordered_logistic(ystar[n], thresholds);
  }
}

This is the approach advocated in Michael Betancourt’s ordinal regression case study, and I believe by rstanarm::stan_polr.

brms allows equidistant thresholds (\delta_1 = \delta_2 = \delta) with the cumulative family, but as far as I can tell, places an improper uniform prior on \delta > 0.

Any ideas on a prior I can use on the thresholds that ensures they are symmetric?

I notice you are using a different formulation than Mike in the case study (he has thresholds as parameters) and I am not sure it is equivalent (but it also roughly makes sense to me). Mike’s approach is IMHO more flexible in that it doesn’t actually assume any structure of the thresholds, so you can have c_1,...c_4 as parameters, then compute c_5,...c_7 (as those then uniquely determined), compute the jacobian adjustment from c_1, ... c_7, and apply the Dirichlet prior, exactly as in the case study.

In theory you also need the Jacobian adjustment for computing c_5, ... ,c_7 from c_1, ..., c_4, but since this is a linear operation, the derivatives are constant and the whole Jacobian is constant and can be ignored.

Hope that’s not too much mumbo jumbo and actually makes some sense (and please double check my reasoning, I am not an expert on this).

Good luck with the model!

1 Like

Thanks, I think I understand.

It seems like then it makes a lot more sense to use Michael’s approach in this scenario, because the data don’t inform p_5, p_6, p_7 (i.e. those parameters that are drawn from the Dirichlet prior but then ignored in calculating the implied cut points).

1 Like