Hello, I hope this is a simple question and I just can’t quite find the answer.
Tldr, where is the error term in an ordinal regression?
I’m fitting a model to calculate broad-sense heritability (H2) from data of half-sib trees that are resistant and susceptible to a pathogen. The trees were scored with an ordinal scale and I have row and column position of the tree in the common garden, plus the identity of their mother tree (family
in the model). Here’s the formula of my brms model:
bcd_resitance ~ row + col + (1 | family)
After fitting that model I have several variance components to work with. I won’t print all of the random effects, but here’s a header on the top few VC’s.
> mod44_bcd %>% posterior_samples() %>% head
b_Intercept[1] b_Intercept[2] b_Intercept[3] b_Intercept[4] b_row
1 -5.007602 -4.797064 -3.248319 -1.656305 -0.002897173
2 -8.040523 -6.377912 -3.448861 -2.374894 -0.015584834
3 -3.456674 -3.238523 -2.391038 -1.328695 0.029783290
4 -3.642539 -3.433781 -2.600983 -1.818058 -0.007478924
5 -9.437411 -7.422390 -6.177879 -3.315334 -0.002940578
6 -6.305978 -5.835490 -4.105455 -2.770113 -0.034043116
b_col sd_family__Intercept disc r_family[1253,Intercept]
1 0.029001988 0.4255657 1 0.2118421
2 -0.003614989 0.3274051 1 0.3215167
3 0.023525005 0.7393705 1 -1.3868402
4 0.022420330 0.5221942 1 -0.8185199
5 -0.022134636 1.0987287 1 0.9537750
6 0.021305094 0.4196056 1 -0.4141462
To calculate H2, use the variance component that represents genotypic variance (Vg = sd_family__Intercept^2) and create a vector of the total phenotypic variance (Vp).
Vg = sd_family__Intercept^2
Vp = sd_family__Intercept^2 + b_row^2 + b_col^2 + sigma^2
H2 = Vg / Vp
In my case, I can’t account for the total variance as I don’t know where the error variance is in my posterior samples. Typically in a guassian model the sigma term is calculated without having to specify it (at least in brms). In my ordinal regression, I can’t find the error term. Can someone help me understand how to account for the total variance from the model, so I can use it in a ratio statistic? What is the error term in ordinal regression and where is it in the posterior samples?
Thanks so much everyone for the help!
Here’s some more details on my model and how the stancode translates it:
> print(mod44_bcd)
Family: cumulative
Links: mu = logit; disc = identity
Formula: bcd_resitance ~ row + col + (1 | family)
Data: filter(dat, plot == 44) (Number of observations: 270)
Draws: 2 chains, each with iter = 4000; warmup = 2000; thin = 1;
total post-warmup draws = 4000
Group-Level Effects:
~family (Number of levels: 100)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.60 0.43 0.02 1.58 1.00 883 1455
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1] -9.44 3.07 -16.94 -4.96 1.00 845 451
Intercept[2] -7.00 2.04 -11.82 -4.10 1.00 1300 1018
Intercept[3] -4.14 0.88 -6.04 -2.55 1.00 1779 1562
Intercept[4] -2.34 0.77 -3.97 -0.97 1.00 1847 1823
row -0.01 0.04 -0.08 0.07 1.00 2661 2350
col 0.01 0.03 -0.04 0.06 1.00 2518 1731
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc 1.00 0.00 1.00 1.00 NA NA NA
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Stancode:
> stancode(mod44_bcd)
// generated with brms 2.19.0
functions {
/* cumulative-logit log-PDF for a single response
* Args:
* y: response category
* mu: latent mean parameter
* disc: discrimination parameter
* thres: ordinal thresholds
* Returns:
* a scalar to be added to the log posterior
*/
real cumulative_logit_lpmf(int y, real mu, real disc, vector thres) {
int nthres = num_elements(thres);
if (y == 1) {
return log_inv_logit(disc * (thres[1] - mu));
} else if (y == nthres + 1) {
return log1m_inv_logit(disc * (thres[nthres] - mu));
} else {
return log_diff_exp(
log_inv_logit(disc * (thres[y] - mu)),
log_inv_logit(disc * (thres[y - 1] - mu))
);
}
}
/* cumulative-logit log-PDF for a single response and merged thresholds
* Args:
* y: response category
* mu: latent mean parameter
* disc: discrimination parameter
* thres: vector of merged ordinal thresholds
* j: start and end index for the applid threshold within 'thres'
* Returns:
* a scalar to be added to the log posterior
*/
real cumulative_logit_merged_lpmf(int y, real mu, real disc, vector thres, int[] j) {
return cumulative_logit_lpmf(y | mu, disc, thres[j[1]:j[2]]);
}
/* ordered-logistic log-PDF for a single response and merged thresholds
* Args:
* y: response category
* mu: latent mean parameter
* thres: vector of merged ordinal thresholds
* j: start and end index for the applid threshold within 'thres'
* Returns:
* a scalar to be added to the log posterior
*/
real ordered_logistic_merged_lpmf(int y, real mu, vector thres, int[] j) {
return ordered_logistic_lpmf(y | mu, thres[j[1]:j[2]]);
}
}
data {
int<lower=1> N; // total number of observations
int Y[N]; // response variable
int<lower=2> nthres; // number of thresholds
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
// data for group-level effects of ID 1
int<lower=1> N_1; // number of grouping levels
int<lower=1> M_1; // number of coefficients per level
int<lower=1> J_1[N]; // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_1;
int prior_only; // should the likelihood be ignored?
}
transformed data {
int Kc = K;
matrix[N, Kc] Xc; // centered version of X
vector[Kc] means_X; // column means of X before centering
for (i in 1:K) {
means_X[i] = mean(X[, i]);
Xc[, i] = X[, i] - means_X[i];
}
}
parameters {
vector[Kc] b; // population-level effects
ordered[nthres] Intercept; // temporary thresholds for centered predictors
vector<lower=0>[M_1] sd_1; // group-level standard deviations
vector[N_1] z_1[M_1]; // standardized group-level effects
}
transformed parameters {
real disc = 1; // discrimination parameters
vector[N_1] r_1_1; // actual group-level effects
real lprior = 0; // prior contributions to the log posterior
r_1_1 = (sd_1[1] * (z_1[1]));
lprior += student_t_lpdf(Intercept | 3, 0, 2.5);
lprior += student_t_lpdf(sd_1 | 3, 0, 2.5)
- 1 * student_t_lccdf(0 | 3, 0, 2.5);
}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = rep_vector(0.0, N);
mu += Xc * b;
for (n in 1:N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n];
}
for (n in 1:N) {
target += ordered_logistic_lpmf(Y[n] | mu[n], Intercept);
}
}
// priors including constants
target += lprior;
target += std_normal_lpdf(z_1[1]);
}
generated quantities {
// compute actual thresholds
vector[nthres] b_Intercept = Intercept + dot_product(means_X, b);
}