# Proper prior for symmetric ordinal regression thresholds

I am attempting to specify an ordinal regression (cumulative logistic / proportional odds) model. Under the latent variable formulation:

y_i = k if y_i^\star \in (c_{k-1}, c_k)

where y_i^{\star} = \beta x_i + \epsilon_i is the latent variable (with \epsilon_i \sim \mathrm{Logistic}(0,1)) and

-\infty = c_0 < c_1 < \cdots < c_{K-1} < c_K = \infty are the thresholds.

In my situation, I think it is reasonable to assume that the thresholds are symmetric about the middle category. For example, if the number of categories K = 7, then c_1 = c_3 - \delta_2, c_2 = c_3 - \delta_1, c_5 = c_4 + \delta_1, c_6 = c_4 + \delta_2, and we need to estimate c_3 < c_4 and 0 < \delta_1 < \delta_2.

With no constraints on the thresholds except that they are ordered, a proper prior can be obtained by putting a Dirichlet prior on the probabilities of falling into each category when x = 0, i.e.:

transformed parameters {
ordered[K-1] thresholds = logit(cumulative_sum(catprobs[1:(K-1)]));
}

model {
catprobs ~ dirichlet(rep_vector(1, K));
for (n in 1:N_obs) {
y[n] ~ ordered_logistic(ystar[n], thresholds);
}
}


This is the approach advocated in Michael Betancourtâ€™s ordinal regression case study, and I believe by rstanarm::stan_polr.

brms allows equidistant thresholds (\delta_1 = \delta_2 = \delta) with the cumulative family, but as far as I can tell, places an improper uniform prior on \delta > 0.

Any ideas on a prior I can use on the thresholds that ensures they are symmetric?

I notice you are using a different formulation than Mike in the case study (he has thresholds as parameters) and I am not sure it is equivalent (but it also roughly makes sense to me). Mikeâ€™s approach is IMHO more flexible in that it doesnâ€™t actually assume any structure of the thresholds, so you can have c_1,...c_4 as parameters, then compute c_5,...c_7 (as those then uniquely determined), compute the jacobian adjustment from c_1, ... c_7, and apply the Dirichlet prior, exactly as in the case study.

In theory you also need the Jacobian adjustment for computing c_5, ... ,c_7 from c_1, ..., c_4, but since this is a linear operation, the derivatives are constant and the whole Jacobian is constant and can be ignored.

Hope thatâ€™s not too much mumbo jumbo and actually makes some sense (and please double check my reasoning, I am not an expert on this).

Good luck with the model!

1 Like

Thanks, I think I understand.

It seems like then it makes a lot more sense to use Michaelâ€™s approach in this scenario, because the data donâ€™t inform p_5, p_6, p_7 (i.e. those parameters that are drawn from the Dirichlet prior but then ignored in calculating the implied cut points).

1 Like