I am experimenting with some ordinal regression models and I have been following the great advice that I was given in another thread (Ordinal regression to model components failure). I think I got good results, but I realized that the ordinal regression does not take into account how the data for each category is distributed. Is it possible to include this information into the model?
I simulated a dataset. We have some components that are tested under different force levels and we report the damage (1, 2, 3). For example, we stress a component with a force = 40 and it may report low (small fractures), medium, or high damage (total failure). The dataset is not balanced as we have more components that break with low damage. Within each damage level, the data is normally distributed—this the info I would like to incorporate into my model.
library('tidyverse')
library('brms')
library('tidybayes')
SEED <- 20220510
NCAT <- 3 # number of damage levels (it will be the ordinal category)
set.seed(SEED)
d <- tibble(
mean = seq(35, 40, length.out=NCAT),
sd = rep(2, times = NCAT),
n = seq(50, 20, length.out=NCAT),
damage = factor(seq(1, NCAT), ordered=T)) %>%
rowwise() %>%
mutate(force = list(rnorm(n = n, mean = mean, sd = sd))) %>%
unnest(cols = c(force))
Then, I model it with a cumulative family
m_ordinal_same <- brm(bf(damage ~ 1 + force, center=T),
data = d,
family = cumulative(logit),
seed = SEED)
and obtain
Family: cumulative
Links: mu = logit; disc = identity
Formula: damage ~ 1 + force
Data: d (Number of observations: 105)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1] 41.26 6.16 30.27 54.52 1.00 1744 1833
Intercept[2] 44.76 6.59 33.09 58.90 1.00 1610 1710
force 1.13 0.17 0.82 1.49 1.00 1658 1779
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc 1.00 0.00 1.00 1.00 NA NA NA
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
As you see from the histograms, the data is sparse for the high damage category (3) but all data comes from a normal distribution with the mean shifted in each category. Would it be good to include this info? Maybe this is not needed? Or, maybe the ordinal regression is not the right model in this case?