# Where's the error term in ordinal regression?

Hello, I hope this is a simple question and I just can’t quite find the answer.

Tldr, where is the error term in an ordinal regression?

I’m fitting a model to calculate broad-sense heritability (H2) from data of half-sib trees that are resistant and susceptible to a pathogen. The trees were scored with an ordinal scale and I have row and column position of the tree in the common garden, plus the identity of their mother tree (family in the model). Here’s the formula of my brms model:

bcd_resitance ~ row + col + (1 | family)


After fitting that model I have several variance components to work with. I won’t print all of the random effects, but here’s a header on the top few VC’s.

> mod44_bcd %>% posterior_samples() %>% head
b_Intercept[1] b_Intercept[2] b_Intercept[3] b_Intercept[4]        b_row
1      -5.007602      -4.797064      -3.248319      -1.656305 -0.002897173
2      -8.040523      -6.377912      -3.448861      -2.374894 -0.015584834
3      -3.456674      -3.238523      -2.391038      -1.328695  0.029783290
4      -3.642539      -3.433781      -2.600983      -1.818058 -0.007478924
5      -9.437411      -7.422390      -6.177879      -3.315334 -0.002940578
6      -6.305978      -5.835490      -4.105455      -2.770113 -0.034043116
b_col sd_family__Intercept disc r_family[1253,Intercept]
1  0.029001988            0.4255657    1                0.2118421
2 -0.003614989            0.3274051    1                0.3215167
3  0.023525005            0.7393705    1               -1.3868402
4  0.022420330            0.5221942    1               -0.8185199
5 -0.022134636            1.0987287    1                0.9537750
6  0.021305094            0.4196056    1               -0.4141462


To calculate H2, use the variance component that represents genotypic variance (Vg = sd_family__Intercept^2) and create a vector of the total phenotypic variance (Vp).

Vg = sd_family__Intercept^2
Vp = sd_family__Intercept^2 + b_row^2 + b_col^2 + sigma^2
H2 = Vg / Vp


In my case, I can’t account for the total variance as I don’t know where the error variance is in my posterior samples. Typically in a guassian model the sigma term is calculated without having to specify it (at least in brms). In my ordinal regression, I can’t find the error term. Can someone help me understand how to account for the total variance from the model, so I can use it in a ratio statistic? What is the error term in ordinal regression and where is it in the posterior samples?

Thanks so much everyone for the help!

Here’s some more details on my model and how the stancode translates it:

> print(mod44_bcd)
Family: cumulative
Links: mu = logit; disc = identity
Formula: bcd_resitance ~ row + col + (1 | family)
Data: filter(dat, plot == 44) (Number of observations: 270)
Draws: 2 chains, each with iter = 4000; warmup = 2000; thin = 1;
total post-warmup draws = 4000

Group-Level Effects:
~family (Number of levels: 100)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.60      0.43     0.02     1.58 1.00      883     1455

Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]    -9.44      3.07   -16.94    -4.96 1.00      845      451
Intercept[2]    -7.00      2.04   -11.82    -4.10 1.00     1300     1018
Intercept[3]    -4.14      0.88    -6.04    -2.55 1.00     1779     1562
Intercept[4]    -2.34      0.77    -3.97    -0.97 1.00     1847     1823
row             -0.01      0.04    -0.08     0.07 1.00     2661     2350
col              0.01      0.03    -0.04     0.06 1.00     2518     1731

Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc     1.00      0.00     1.00     1.00   NA       NA       NA

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).


Stancode:

> stancode(mod44_bcd)
// generated with brms 2.19.0
functions {
/* cumulative-logit log-PDF for a single response
* Args:
*   y: response category
*   mu: latent mean parameter
*   disc: discrimination parameter
*   thres: ordinal thresholds
* Returns:
*   a scalar to be added to the log posterior
*/
real cumulative_logit_lpmf(int y, real mu, real disc, vector thres) {
int nthres = num_elements(thres);
if (y == 1) {
return log_inv_logit(disc * (thres[1] - mu));
} else if (y == nthres + 1) {
return log1m_inv_logit(disc * (thres[nthres] - mu));
} else {
return log_diff_exp(
log_inv_logit(disc * (thres[y] - mu)),
log_inv_logit(disc * (thres[y - 1] - mu))
);
}
}
/* cumulative-logit log-PDF for a single response and merged thresholds
* Args:
*   y: response category
*   mu: latent mean parameter
*   disc: discrimination parameter
*   thres: vector of merged ordinal thresholds
*   j: start and end index for the applid threshold within 'thres'
* Returns:
*   a scalar to be added to the log posterior
*/
real cumulative_logit_merged_lpmf(int y, real mu, real disc, vector thres, int[] j) {
return cumulative_logit_lpmf(y | mu, disc, thres[j[1]:j[2]]);
}
/* ordered-logistic log-PDF for a single response and merged thresholds
* Args:
*   y: response category
*   mu: latent mean parameter
*   thres: vector of merged ordinal thresholds
*   j: start and end index for the applid threshold within 'thres'
* Returns:
*   a scalar to be added to the log posterior
*/
real ordered_logistic_merged_lpmf(int y, real mu, vector thres, int[] j) {
return ordered_logistic_lpmf(y | mu, thres[j[1]:j[2]]);
}
}
data {
int<lower=1> N;  // total number of observations
int Y[N];  // response variable
int<lower=2> nthres;  // number of thresholds
int<lower=1> K;  // number of population-level effects
matrix[N, K] X;  // population-level design matrix
// data for group-level effects of ID 1
int<lower=1> N_1;  // number of grouping levels
int<lower=1> M_1;  // number of coefficients per level
int<lower=1> J_1[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_1;
int prior_only;  // should the likelihood be ignored?
}
transformed data {
int Kc = K;
matrix[N, Kc] Xc;  // centered version of X
vector[Kc] means_X;  // column means of X before centering
for (i in 1:K) {
means_X[i] = mean(X[, i]);
Xc[, i] = X[, i] - means_X[i];
}
}
parameters {
vector[Kc] b;  // population-level effects
ordered[nthres] Intercept;  // temporary thresholds for centered predictors
vector<lower=0>[M_1] sd_1;  // group-level standard deviations
vector[N_1] z_1[M_1];  // standardized group-level effects
}
transformed parameters {
real disc = 1;  // discrimination parameters
vector[N_1] r_1_1;  // actual group-level effects
real lprior = 0;  // prior contributions to the log posterior
r_1_1 = (sd_1[1] * (z_1[1]));
lprior += student_t_lpdf(Intercept | 3, 0, 2.5);
lprior += student_t_lpdf(sd_1 | 3, 0, 2.5)
- 1 * student_t_lccdf(0 | 3, 0, 2.5);
}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = rep_vector(0.0, N);
mu += Xc * b;
for (n in 1:N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n];
}
for (n in 1:N) {
target += ordered_logistic_lpmf(Y[n] | mu[n], Intercept);
}
}
// priors including constants
target += lprior;
target += std_normal_lpdf(z_1[1]);
}
generated quantities {
// compute actual thresholds
vector[nthres] b_Intercept = Intercept + dot_product(means_X, b);
}


In logit regressions, and their extensions to cumulative models, the linear predictor predicts the mean of a latent variable on the logit (log-odds) scale. This latent variable is assumed, by convention, to have a logistic distribution with a standard deviation of exactly 1 (as shown by the Family Specific Parameters for “disc”). So, as the effects grow larger in size, the amount of error relative to the effects’ sizes decreases. For your sigma term, you can use pi/sqrt(3) because that is the SD for a normal distribution that is most similar to a logistic distribution with an SD of 1 <edit: you would need to multiply all your effects by that amount too, but this scaling cancels out given you are calculating a ratio, so you could just use sigma = 1 and leave all the effects as is>. Alternatively, you could use a probit link in your model, and then the latent scale is normally distributed with an SD of 1, so you would have sigma = 1.

2 Likes

Tiny clarification: the scale parameter s is constrained to 1, where \sigma = \frac{s \pi}{\sqrt{3}} (Wikipedia), meaning the standard deviation of a standard logistic regression model is \frac{\pi}{\sqrt{3}}. Scaling the logistic regression parameters by \frac{\sqrt{3}}{\pi} makes them comparable to a probit model with normal distribution of \sigma = 1.

From Long 1997 “Regression Models for Categorical and Limited Dependent Variables” page 48.

Agreed that a probit link makes this all much easier and more congenial, since you will be assuming that the variances of both the distribution of family intercepts and residual variance of the latent variable are equal to 1.

2 Likes