Ordinal Regression Predicted Label Evaulation

Hi All,

I’m new to Bayesian modeling so forgive me if I’ve missed this concept in my reading.

I’m trying to build an ordinal regression model to do the following. I have a set of clinical variables based on a set of guidelines that physicians are supposed to use to classify the severity of a disease (mild, mild to moderate, moderate, moderate to severe, severe), along with the actual label they assigned to a patient. I’ve modeled this using brms as a category specific adjacent category model.

model.acat.1 <- brm(formula = severity_label ~ 
                 cs(var_1) + 
                 cs(var_2) +
                 cs(var_3) +
                 cs(var_4) + 
                 cs(var_5),
    family = acat(link = "logit"),
    data = df.model.cc,
    chains = 4,
    cores = 4, 
    inits = "0", 
    backend = "cmdstanr",
    threads = threading(threads = 12),
    prior = prior(class = "b", normal(0,4)),
    file = "model_fits/continuous_ordinal_acat"
    )

My goal is to be able to compare the predicted labels from this model to the physician assigned label to see relative to the guidelines a.) Given a label category which variable(s) might physician “lean on” the most when assigning a severity label, b.) how well does this model predict the physicians assigned, c.) if model is good then use it to label a set of unlabeled reports for later analysis.

What I’m stuck on is how to assess b, while I could use WIAC or LOO I’m not really looking for model comparison out of sample. What I’m trying to do is understand given something like a “crosstabs” table what proportion of the time (draws) from say mild(prediction) and mild (labeled) do they align correctly. My initial hypothesis is that some label pairs have wider uncertainty than others so overall measures of model performance isn’t helpful, I need label pair specific comparisons.

Is there another method out there that I’m missing?

Thank you!

1 Like

Not an expert on this, but since nobody answered, I will give it a try.

I think what you describe can relatively easily be achieved by examining posterior predictions (via posterior_predict) - you can group the predictions in any way you like, so say plotting a heatmap of “assigned label” vs. “posterior probability of all labels” is relatively easy (if not, please ask!). You make the predictions, group them by the actual observed outcomes and/or the predicted outcome and compute any statistics you want per group (accuracy, entropy, …)

Sidenote: Not sure if it is relevant, but hopefully it provides some inspiration. I once worked on a much simpler model that matched assignments of NYHA score by clinicians and guidelines based on ergospirometry (I think - hope I get the term right), one thing I was able to get from the model was what the thresholds for ergospirometry would have to be to match the score assigned by clinicinas as well as possible, so that might also be a relevant thing to look at (but your model is much more complex, so not sure this is easily attainable).

In that project I also made this plot:

Which shows thresholds according to guidelines (thick blue line) and mean of posterior predictions of the model (mean is somewhat problematic for ordinal models, but provides a nice summary here).

Best of luck with your model!

1 Like