What are the units of coefficients in an ordered-logistic model

Following Matti Vuorre and Paul Burkner’s excellent tutorial paper, I am running a simple ordered-logistic model with brms:

brm(confidence ~ x1 * x2 + 
                (1 + x1 * x2 | participant),
                data = data, 
                family = cumulative('logit'),
                prior = prior(normal(0,1), class = 'b') +
                prior(normal(0,1), class = 'sd') +
                prior(lkj(2), class = 'cor'))

I’m not sure about the units of the coefficients estimated by this model. I know the units of the predictors x1 and x2, and that the coefficients for these predictors are the change in the latent standard normal variable as a corresponding to a unit change in x1 and x2. So does it make sense to talk about the units of the coefficients? Or does it make more sense to standardize x1 and x2 and report standardized coefficients?

Thanks!

Hi @yaniva 👋

Yes, these can be a bit tricky to interpret because they are not on the (possibly) numerical scale of confidence. You are quite right that they are in the latent scale, but since you used a logit link function, they are on the logit scale. The only difference between those two is that the assumed logistic latent variable has “fatter tails”:

For practical interpretation, this difference doesn’t matter. In either case, it does make sense to talk about the coefficients, and you shouldn’t do any standardization (both link functions already assume a mean = 0 and dispersion = 1).

So what does e.g. x1 = 0.1 mean? You can think that a one unit increase in x1 shifts the entire latent distribution 0.1 units to the right. As a consequence, the probabilities with which responses fall to one or the other side of each of the thresholds shift such that greater response values become more likely.

Does that make any sense? If you’re familiar with the standard normal or logistic distribution, probably yes. But consumers of these analyses often are not. In those cases, you might consider calculating probabilities for the different response categories e.g. when x1 = 0 and when x1 = 1. You could then report e.g. that

“probability(response = 4 “really confident” given x1 = 0)” = 0.5 95%CI=[0.4, 0.6], and “probability(response = 4 “really confident” given x1 = 1)” = 0.55 [0.45, 0.65], and the difference was 0.05 [0.025, 0.075] (made up numbers!) That is, participants were on average 0.05% likely to use the highest response category when x1 was 1 versus when it was 0. Overall, the model indicated that the (assumed to be logistic-distributed) latent variable “confidence” increased by 0.1 units as a function of x1.

Does that help?

3 Likes

Yes it does! thanks for the detailed explanation.

Converting to probability of response is very useful.

And I didn’t know that a logistic distribution is used with a logit link - thanks for the explanation! If I had wanted a normal distribution, I should have chosen a probit link?

Thanks again!

Glad you found my response helpful 😊

Yes, if you want to assume a normal latent variable, you can use cumulative('probit') or cumulative('probit_approx'), with the latter being sometimes a little bit faster. The logit link function is usually better behaved though, and samples faster, so unless there’s a specific reason to use a probit, I recommend going with the logit. The probit link sometimes requires setting brm(..., init = 0) to sample though, so you can try that if it doesn’t seem to work.

If you want to look at different interpretations of the effect sizes in these models, this is a really nice and detailed blog post about that topic by @Solomon. (I am not sure if it applies to your specific model though.)

(If you think the answer above answered your question, feel free to mark it as the solution [there should be a little button somewhere]. We’re all just fishing for karma aren’t we 😉).

2 Likes