Where's the error term in ordinal regression?

Hello, I hope this is a simple question and I just can’t quite find the answer.

Tldr, where is the error term in an ordinal regression?

I’m fitting a model to calculate broad-sense heritability (H2) from data of half-sib trees that are resistant and susceptible to a pathogen. The trees were scored with an ordinal scale and I have row and column position of the tree in the common garden, plus the identity of their mother tree (family in the model). Here’s the formula of my brms model:

bcd_resitance ~ row + col + (1 | family)

After fitting that model I have several variance components to work with. I won’t print all of the random effects, but here’s a header on the top few VC’s.

> mod44_bcd %>% posterior_samples() %>% head
  b_Intercept[1] b_Intercept[2] b_Intercept[3] b_Intercept[4]        b_row
1      -5.007602      -4.797064      -3.248319      -1.656305 -0.002897173
2      -8.040523      -6.377912      -3.448861      -2.374894 -0.015584834
3      -3.456674      -3.238523      -2.391038      -1.328695  0.029783290
4      -3.642539      -3.433781      -2.600983      -1.818058 -0.007478924
5      -9.437411      -7.422390      -6.177879      -3.315334 -0.002940578
6      -6.305978      -5.835490      -4.105455      -2.770113 -0.034043116
         b_col sd_family__Intercept disc r_family[1253,Intercept]
1  0.029001988            0.4255657    1                0.2118421
2 -0.003614989            0.3274051    1                0.3215167
3  0.023525005            0.7393705    1               -1.3868402
4  0.022420330            0.5221942    1               -0.8185199
5 -0.022134636            1.0987287    1                0.9537750
6  0.021305094            0.4196056    1               -0.4141462

To calculate H2, use the variance component that represents genotypic variance (Vg = sd_family__Intercept^2) and create a vector of the total phenotypic variance (Vp).

Vg = sd_family__Intercept^2
Vp = sd_family__Intercept^2 + b_row^2 + b_col^2 + sigma^2
H2 = Vg / Vp 

In my case, I can’t account for the total variance as I don’t know where the error variance is in my posterior samples. Typically in a guassian model the sigma term is calculated without having to specify it (at least in brms). In my ordinal regression, I can’t find the error term. Can someone help me understand how to account for the total variance from the model, so I can use it in a ratio statistic? What is the error term in ordinal regression and where is it in the posterior samples?

Thanks so much everyone for the help!

Here’s some more details on my model and how the stancode translates it:

> print(mod44_bcd)
 Family: cumulative 
  Links: mu = logit; disc = identity 
Formula: bcd_resitance ~ row + col + (1 | family) 
   Data: filter(dat, plot == 44) (Number of observations: 270) 
  Draws: 2 chains, each with iter = 4000; warmup = 2000; thin = 1;
         total post-warmup draws = 4000

Group-Level Effects: 
~family (Number of levels: 100) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.60      0.43     0.02     1.58 1.00      883     1455

Population-Level Effects: 
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]    -9.44      3.07   -16.94    -4.96 1.00      845      451
Intercept[2]    -7.00      2.04   -11.82    -4.10 1.00     1300     1018
Intercept[3]    -4.14      0.88    -6.04    -2.55 1.00     1779     1562
Intercept[4]    -2.34      0.77    -3.97    -0.97 1.00     1847     1823
row             -0.01      0.04    -0.08     0.07 1.00     2661     2350
col              0.01      0.03    -0.04     0.06 1.00     2518     1731

Family Specific Parameters: 
     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc     1.00      0.00     1.00     1.00   NA       NA       NA

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Stancode:

> stancode(mod44_bcd)
// generated with brms 2.19.0
functions {
  /* cumulative-logit log-PDF for a single response
   * Args:
   *   y: response category
   *   mu: latent mean parameter
   *   disc: discrimination parameter
   *   thres: ordinal thresholds
   * Returns:
   *   a scalar to be added to the log posterior
   */
   real cumulative_logit_lpmf(int y, real mu, real disc, vector thres) {
     int nthres = num_elements(thres);
     if (y == 1) {
       return log_inv_logit(disc * (thres[1] - mu));
     } else if (y == nthres + 1) {
       return log1m_inv_logit(disc * (thres[nthres] - mu));
     } else {
       return log_diff_exp(
         log_inv_logit(disc * (thres[y] - mu)), 
         log_inv_logit(disc * (thres[y - 1] - mu))
       );
     }
   }
  /* cumulative-logit log-PDF for a single response and merged thresholds
   * Args:
   *   y: response category
   *   mu: latent mean parameter
   *   disc: discrimination parameter
   *   thres: vector of merged ordinal thresholds
   *   j: start and end index for the applid threshold within 'thres'
   * Returns:
   *   a scalar to be added to the log posterior
   */
   real cumulative_logit_merged_lpmf(int y, real mu, real disc, vector thres, int[] j) {
     return cumulative_logit_lpmf(y | mu, disc, thres[j[1]:j[2]]);
   }
  /* ordered-logistic log-PDF for a single response and merged thresholds
   * Args:
   *   y: response category
   *   mu: latent mean parameter
   *   thres: vector of merged ordinal thresholds
   *   j: start and end index for the applid threshold within 'thres'
   * Returns:
   *   a scalar to be added to the log posterior
   */
   real ordered_logistic_merged_lpmf(int y, real mu, vector thres, int[] j) {
     return ordered_logistic_lpmf(y | mu, thres[j[1]:j[2]]);
   }
}
data {
  int<lower=1> N;  // total number of observations
  int Y[N];  // response variable
  int<lower=2> nthres;  // number of thresholds
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  int Kc = K;
  matrix[N, Kc] Xc;  // centered version of X
  vector[Kc] means_X;  // column means of X before centering
  for (i in 1:K) {
    means_X[i] = mean(X[, i]);
    Xc[, i] = X[, i] - means_X[i];
  }
}
parameters {
  vector[Kc] b;  // population-level effects
  ordered[nthres] Intercept;  // temporary thresholds for centered predictors
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  vector[N_1] z_1[M_1];  // standardized group-level effects
}
transformed parameters {
  real disc = 1;  // discrimination parameters
  vector[N_1] r_1_1;  // actual group-level effects
  real lprior = 0;  // prior contributions to the log posterior
  r_1_1 = (sd_1[1] * (z_1[1]));
  lprior += student_t_lpdf(Intercept | 3, 0, 2.5);
  lprior += student_t_lpdf(sd_1 | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
}
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] mu = rep_vector(0.0, N);
    mu += Xc * b;
    for (n in 1:N) {
      // add more terms to the linear predictor
      mu[n] += r_1_1[J_1[n]] * Z_1_1[n];
    }
    for (n in 1:N) {
      target += ordered_logistic_lpmf(Y[n] | mu[n], Intercept);
    }
  }
  // priors including constants
  target += lprior;
  target += std_normal_lpdf(z_1[1]);
}
generated quantities {
  // compute actual thresholds
  vector[nthres] b_Intercept = Intercept + dot_product(means_X, b);
}

In logit regressions, and their extensions to cumulative models, the linear predictor predicts the mean of a latent variable on the logit (log-odds) scale. This latent variable is assumed, by convention, to have a logistic distribution with a standard deviation of exactly 1 (as shown by the Family Specific Parameters for “disc”). So, as the effects grow larger in size, the amount of error relative to the effects’ sizes decreases. For your sigma term, you can use pi/sqrt(3) because that is the SD for a normal distribution that is most similar to a logistic distribution with an SD of 1 <edit: you would need to multiply all your effects by that amount too, but this scaling cancels out given you are calculating a ratio, so you could just use sigma = 1 and leave all the effects as is>. Alternatively, you could use a probit link in your model, and then the latent scale is normally distributed with an SD of 1, so you would have sigma = 1.

2 Likes

Tiny clarification: the scale parameter s is constrained to 1, where \sigma = \frac{s \pi}{\sqrt{3}} (Wikipedia), meaning the standard deviation of a standard logistic regression model is \frac{\pi}{\sqrt{3}}. Scaling the logistic regression parameters by \frac{\sqrt{3}}{\pi} makes them comparable to a probit model with normal distribution of \sigma = 1.

From Long 1997 “Regression Models for Categorical and Limited Dependent Variables” page 48.
image

Agreed that a probit link makes this all much easier and more congenial, since you will be assuming that the variances of both the distribution of family intercepts and residual variance of the latent variable are equal to 1.

2 Likes