Several helpful replies to posts state that in an ordinal probit regression fit with brms, the latent dependent variable \tilde{Y} is standard normal and that beta coefficients are thus estimated on that scale (e.g., #23104/2, #34432/3, #34432/5, #29425/2). @paul.buerkner and @matti’s excellent (2019) tutorial also interprets beta coefficient estimates as the number of SD units on the latent scale that are associated with a one-unit change in the predictor (p. 86).
My question is whether this means the beta coefficients are \tilde{Y}-standardized, by which I mean they are scaled to the square root of the total estimated variance of latent \tilde{Y}. By total estimated variance of \tilde{Y}, I mean the sum of the explained variance and residual variance (or, given that the residual variance is fixed to 1 for model identification, the explained variance + 1).
My impression is that they are not \tilde{Y}-standardized and that, as a result, beta coefficients from models with different predictors of the same DV are not comparable. The reason is that in probit regression the total estimated variance of latent \tilde{Y}, which is entirely dependent on the explained variance (because residual variance is fixed), decreases if a predictor from the data-generating model is omitted (Breen et al., 2018; Karlson et al., 2012; Mood, 2010). This is in contrast to linear regression, where the total variance of an observed DV can be computed from the data.
I ran the following simulation to try to determine whether the coefficients are \tilde{Y}-standardized, and I’d love any feedback on this (I could be missing something). In addition to @paul.buerkner and @matti, I’m tagging @Solomon and @andymilne, who have replied to related posts and may find this interesting. Full code: simulation.R (3.2 KB)
First, simulate latent \tilde{Y} generated from a linear combination of a focal predictor X, an “omitted predictor” Z (will be included later in a full brms
model that is “true” but will be omitted from a reduced model), and standard normal error. To illustrate the situation even when X and Z are completely uncorrelated, first regress Z on X and use X-residualized Z, which I’ll call Z_{resid}.
n <- 3000
x <- rnorm(n)
z <- rnorm(n)
z_resid <- residuals(lm(z ~ x))
error_term <- rnorm(n)
latent_y <- 0.5 * x + 1.5 * z_resid + error_term
Next, transform latent \tilde{Y} into observed ordinal Y (with 4 categories) and create data frame.
obs_y <- as.ordered(cut(latent_y,
breaks = quantile(latent_y, probs = 0:4 / 4),
include.lowest = TRUE, labels = FALSE))
dat <- data.frame(latent_y, obs_y, x, z_resid)
Run reduced model omitting Z_{resid} (Model 1) and full model including both predictors (Model 2).
model1 <- brm(obs_y ~ x, family = cumulative("probit"), data = dat)
model2 <- brm(obs_y ~ x + z_resid, family = cumulative("probit"), data = dat)
Notice that in the reduced model (Model 1), b_x (0.30) is smaller than the value used to generate the data (0.5), whereas the full model (Model 2) yields a closer estimate (b_x = 0.53).
summary(model1)
summary(model2)
(By contrast, notice that linear regressions of latent \tilde{Y} on X yield estimates of b_x [0.52] close to the data-generating value [0.5] regardless of whether Z_{resid} is omitted or included.)
summary(lm(latent_y ~ x, data = dat))
summary(lm(latent_y ~ x + z_resid, data = dat))
Now for both probit models (code shown only for Model 1, but steps for Model 2 are identical) compute the explained variance of latent \tilde{Y}. Next, because the residual variance is fixed at 1, add 1 to compute the total estimated variance of latent \tilde{Y}.
# Compute explained variance of latent Y for each posterior draw
linpred_model1 <- posterior_linpred(model1)
explained_var_model1 <- apply(linpred_model1, 1, var)
# Compute total variance of latent Y (residual variance is fixed at 1)
total_var_model1 <- mean(explained_var_model1) + 1
Notice that the total estimated variance in latent \tilde{Y} is lower in probit Model 1 (1.09) than Model 2 (3.32). Even though Z_{resid} is uncorrelated with X, because it helps explain latent \tilde{Y} (after all, it was part of the data-generating model), omitting it from the probit model lowers the explained variance of latent \tilde{Y} and in turn the total estimated variance. Notice also that the total estimated variance in Model 2 is close to the total variance of latent \tilde{Y} computed from the simulated data.
round(total_var_model1, 2) == 1.09
round(total_var_model2, 2) == 3.32
round(var(dat$latent_y), 2) == 3.42
Now, if the b_x estimates in the probit models were already \tilde{Y}-standardized (i.e., if they already indicated the number of SD units in latent \tilde{Y} associated with a one-unit increase in X), then it seems the b_x estimates would have been the same in Models 1 and 2, because even though Model 1’s omitting Z_{resid} yielded a lower total estimated variance of latent \tilde{Y}, standardizing the b_x estimates to the square root of this variance would have accounted for the difference. But as we saw in the output above, the b_x estimates differ between Models 1 (0.30) and 2 (0.53). The implication is that the b_x estimates are not comparable between Models 1 and 2 as they are in linear regression. Only after \tilde{Y}-standardizing the b_x estimates will the values be the same.
To do so, take the square root of the total estimated variance of latent \tilde{Y} to compute the total estimated SD of latent \tilde{Y}. Use this SD to \tilde{Y}-standardize b_x, obtaining b_{x_{std}}.
total_sd_model1 <- sqrt(total_var_model1)
b_x_model1 <- fixef(model1)["x", "Estimate"]
b_x_std_model1 <- b_x_model1/total_sd_model1
Whereas the b_x estimates for Models 1 and 2 differ (0.30 and 0.53, respectively), after scaling them to the total estimated SD of latent \tilde{Y}, the resulting b_{x_{std}} estimates are the same (0.29).
round(b_x_model1, 2) == 0.30
round(b_x_model2, 2) == 0.53
round(b_x_std_model1, 2) == 0.29
round(b_x_std_model2, 2) == 0.29
Does this seem correct, or am I missing something? My initial interpretation of statements that the beta coefficients represented the number of SD units on the latent scale that are associated with a one-unit change in the predictor was that the coefficients were already \tilde{Y}-standardized, but this does not seem to be the case. Thank you in advance for your help!