Ordinal regression: add threshold intercepts to BLUPs for quantitative genetics?

Hi All,

I’m working on a quantitative genetics project where I need to model the fungal disease severity of a clonally propagated plant which has been collected as ordinal data to create BLUPs. Each clone was grown in a field with known row and column position, and has a replicate code and an individual code which I use as a grouping. In quantitative genetics it is common to add add the y-intercept to the random effects to report the BLUPs in measurement scale. I realize this is somewhat pointless to do with an ordinal traits, as the modeling process yields BLUPs that vary continuously. A BLUP of 1.5 is somewhat meaningless when compared to the ordinal scale which is based on the perceived amount of disease, separated by the thresholds. Nevertheless, it seems useful to the reader to present BLUPs in measurement scale where zero = no disease and values greater than zero represent some amount of disease.

My question is, can I sum the K threshold intercepts and add them to each random effect to shift the distribution to start at zero? Is this approach statistically sound? Are there any QG people here who can comment whether I need to do this or even if I should?

Here’s a model I’ve fit to the data.
ind_code = subject id (random effect)
d1 = ordinal disease trait
row = row position in garden (fixed effect)
col = column position garden (fixed effect)

> fit.c
 Family: cumulative
  Links: mu = logit; disc = identity
Formula: d1 ~ row + col + (1 | ind_code)
   Data: dis1 (Number of observations: 447)
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Group-Level Effects:
~ind_code (Number of levels: 282)
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     3.24      0.46     2.45     4.20 1.01      649     1396

Population-Level Effects:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]    -0.87      0.48    -1.80     0.06 1.00     1880     2657
Intercept[2]     0.38      0.48    -0.51     1.31 1.00     1848     2615
Intercept[3]     1.16      0.49     0.26     2.12 1.00     1799     2585
Intercept[4]     2.44      0.54     1.44     3.52 1.01     1417     2501
row             -0.07      0.02    -0.12    -0.02 1.00     2285     2519
col             -0.04      0.01    -0.07    -0.01 1.00     2374     2724

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

If I take the four threshold BLUPs, sum them, and add them to the random effects, the resulting distribution starts at (well, nearly) zero.

# Sum the latent threshold intercepts
Y_t = -0.87 + 0.38 + 1.16 + 2.44

# Add them to the BLUPs to shift to zero
disease_severity = Y_t + ranef(fit.c)$ind_code[,1,1]

> summary(disease_severity)
Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.2112  1.0603  1.7899  3.1036  4.7797  9.4650

> hist(disease_severity, plot=F)
$breaks
[1] -1  0  1  2  3  4  5  6  7  8  9 10
$counts
[1]  2 62 81 10 25 43 20  4 17  9  9
$density
[1] 0.007092199 0.219858156 0.287234043 0.035460993 0.088652482 0.152482270
[7] 0.070921986 0.014184397 0.060283688 0.031914894 0.031914894
$mids
[1] -0.5  0.5  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5
$xname
[1] "ranef(fit.c)$ind_code[, 1, 1] + Y_t"
$equidist
[1] TRUE
attr(,"class")
[1] "histogram"

Thanks for your help and thanks for making Bayesian modeling more approachable with RStan and brms.

Karl

1 Like

Hi, sorry that your question slipped through. Did you manage to move forward in the meantime?

I think the biggest problem is that the ordinal model doesn’t easily yield a true “0” - it assumes there is some probability of disease always. What can be extracted relatively easily are quantities like “Probability observed ordinal value > 1” or “Mean of observed ordinal values”. Which you should be able to get easily via posterior predictions (posterior_predict or posterior_linpred, depending on use case) and can serve as a simple summary.

Best of luck with you model!