Ordinal regression to model components failure

Imagine you have some mechanical components to stress test. Depending of an impact force, they may get damaged or brake. Let’s say that damage is measured on a categorical 1–5 scale, where 1 is a minor damage and 5 is a broken component.

damage \sim force

and data could look like

test	force	damage
a	    10	    2
b	    13	    2
c	    16	    3
d	    25	    4
...

What I am interested in is “What is the probability that an impact force will lead to a damage greater than 3”. This means that I am interested all damages above 3 including 3, 4, and 5. I could do a multinomial logistic regression but would an ordinal regression model be more appropriate? From the the ordinal regression tutorial (https://psyarxiv.com/x8swp/), it sounds like I could use a Sequential model (SM).

brm(damage ∼ force, family = sratio(), ...)

Is it possible to use brms to model a “cumulative” outcome directly (damage larger than 3 rather than damage equal to 3)? Or is it better to model each damage level (3 or 4 or 5) and then find the probability of damage larger than 3 in post-processing?

Would you be able to help?

Hello @Jack_Caster, if I’m understanding you correctly the way I would approach this personally is with family = cumulative as an ordered response variable (with a latent continuous predictor). Then from fitted() on the model, I believe the default behaviour is to return predicted probabilities for each of the possible response levels.

The cumulative link model has a direct meaning in terms of the predicted probability of some category or any lower-ranked category. In practice the way I’d probably approach this would be to just sum the predicted probabilities of categories 4 and 5, for each sample, as would come from fitted(). I think this will be the simplest pathway to get there.

2 Likes

An ordinal model would be more appropriate but it might make no practical difference. If you estimate a multinomial model, it is possible you might get something like

PR(\text{2 vs 1}) = logit^{-1}(1 \times force) \\ PR(\text{3 vs 1}) = logit^{-1}(2 \times force) \\ PR(\text{4 vs 1}) = logit^{-1}(0 \times force) \\ PR(\text{5 vs 1}) = logit^{-1}(-2 \times force)

That is, greater force might be associated with a higher probability of levels 2 and 3 compared to 1, no increased probability of 4 compared to 1, and a decrease in probability for level 5 compared to 1. This could lead to odd or counter-intuitive predictions (see attached script). More concretely, it does not assume that an increase in force is always associated with an increase in average damage. This is demonstrated in the plot below, which uses the equations above.

Of course, if you have enough data and it truly shows that damage levels 2 through 5 all increase relative to level 1 as force increases, then the multinomial model may perform just fine.

Standard ordinal models assume that the coefficients are the same across all comparisons. This ensures that an increase in force will always be associated with an increase in damage (assuming the coefficient is positive). This is demonstrated in the plot below, where the cumulative logistic regression equation is y = 0 + 0.5 \times force and the thresholds are [-1.5, -0.5, 0.5, 1.5].

The choice between cumulative, sequential, and adjacent leads to different assumptions. As you point out, the cumulative specification makes it the easiest to calculate your target quantity (the probability that damage is 3 or above). Interpretation-wise, it reduces to a logistic regression model where you split the data at 3. I’m less familiar with sequential and adjacent models. I think they are more flexible than cumulative models with the consequence that they can not be reduced as easily to a binary model for your target quantity. But they may have other appealing features you might be interested in.

ordinal_example.R (3.1 KB)

2 Likes

@AWoodward and @simonbrauer you provided some excellent feedback. Thank you! I am happy that you sort of confirmed my initial ideas and at the same time I have something to learn more about. I will experiment with the cumulative family first (the hyp and the data indicate that an increase in force leads to greater damage and I am interested in the cut off thresholds). Then, I can try with other families. Thank you again

I have another question. Currently I have a model that predicts failure degree based on applied stress force (I got much help also in Incorporate data distribution into ordinal regression - Interfaces / brms - The Stan Forums (mc-stan.org)). Now, I would like to estimate the opposite. For example, as depicted by the red annotation in the graph, “What is the range of forces that is associated with a certain probability of damage=3?”

Is this possible in postprocessing without the need of refitting a model force ~ damage?