Are betas latent-Y-standardized in ordinal probit "brms" models?

Several helpful replies to posts state that in an ordinal probit regression fit with brms, the latent dependent variable \tilde{Y} is standard normal and that beta coefficients are thus estimated on that scale (e.g., #23104/2, #34432/3, #34432/5, #29425/2). @paul.buerkner and @matti’s excellent (2019) tutorial also interprets beta coefficient estimates as the number of SD units on the latent scale that are associated with a one-unit change in the predictor (p. 86).

My question is whether this means the beta coefficients are \tilde{Y}-standardized, by which I mean they are scaled to the square root of the total estimated variance of latent \tilde{Y}. By total estimated variance of \tilde{Y}, I mean the sum of the explained variance and residual variance (or, given that the residual variance is fixed to 1 for model identification, the explained variance + 1).

My impression is that they are not \tilde{Y}-standardized and that, as a result, beta coefficients from models with different predictors of the same DV are not comparable. The reason is that in probit regression the total estimated variance of latent \tilde{Y}, which is entirely dependent on the explained variance (because residual variance is fixed), decreases if a predictor from the data-generating model is omitted (Breen et al., 2018; Karlson et al., 2012; Mood, 2010). This is in contrast to linear regression, where the total variance of an observed DV can be computed from the data.

I ran the following simulation to try to determine whether the coefficients are \tilde{Y}-standardized, and I’d love any feedback on this (I could be missing something). In addition to @paul.buerkner and @matti, I’m tagging @Solomon and @andymilne, who have replied to related posts and may find this interesting. Full code: simulation.R (3.2 KB)

First, simulate latent \tilde{Y} generated from a linear combination of a focal predictor X, an “omitted predictor” Z (will be included later in a full brms model that is “true” but will be omitted from a reduced model), and standard normal error. To illustrate the situation even when X and Z are completely uncorrelated, first regress Z on X and use X-residualized Z, which I’ll call Z_{resid}.

n <- 3000
x <- rnorm(n)
z <- rnorm(n)
z_resid <- residuals(lm(z ~ x))
error_term <- rnorm(n)

latent_y <- 0.5 * x + 1.5 * z_resid + error_term

Next, transform latent \tilde{Y} into observed ordinal Y (with 4 categories) and create data frame.

obs_y <- as.ordered(cut(latent_y, 
                        breaks = quantile(latent_y, probs = 0:4 / 4), 
                        include.lowest = TRUE, labels = FALSE))

dat <- data.frame(latent_y, obs_y, x, z_resid)

Run reduced model omitting Z_{resid} (Model 1) and full model including both predictors (Model 2).

model1 <- brm(obs_y ~ x,           family = cumulative("probit"), data = dat)
model2 <- brm(obs_y ~ x + z_resid, family = cumulative("probit"), data = dat)

Notice that in the reduced model (Model 1), b_x (0.30) is smaller than the value used to generate the data (0.5), whereas the full model (Model 2) yields a closer estimate (b_x = 0.53).

summary(model1)
summary(model2)


(By contrast, notice that linear regressions of latent \tilde{Y} on X yield estimates of b_x [0.52] close to the data-generating value [0.5] regardless of whether Z_{resid} is omitted or included.)

summary(lm(latent_y ~ x,           data = dat))
summary(lm(latent_y ~ x + z_resid, data = dat))

Summary of Reduced Linear Model

Now for both probit models (code shown only for Model 1, but steps for Model 2 are identical) compute the explained variance of latent \tilde{Y}. Next, because the residual variance is fixed at 1, add 1 to compute the total estimated variance of latent \tilde{Y}.

# Compute explained variance of latent Y for each posterior draw
linpred_model1 <- posterior_linpred(model1)
explained_var_model1 <- apply(linpred_model1, 1, var)

# Compute total variance of latent Y (residual variance is fixed at 1)
total_var_model1 <- mean(explained_var_model1) + 1

Notice that the total estimated variance in latent \tilde{Y} is lower in probit Model 1 (1.09) than Model 2 (3.32). Even though Z_{resid} is uncorrelated with X, because it helps explain latent \tilde{Y} (after all, it was part of the data-generating model), omitting it from the probit model lowers the explained variance of latent \tilde{Y} and in turn the total estimated variance. Notice also that the total estimated variance in Model 2 is close to the total variance of latent \tilde{Y} computed from the simulated data.

round(total_var_model1, 2)  == 1.09
round(total_var_model2, 2)  == 3.32

round(var(dat$latent_y), 2) == 3.42

Now, if the b_x estimates in the probit models were already \tilde{Y}-standardized (i.e., if they already indicated the number of SD units in latent \tilde{Y} associated with a one-unit increase in X), then it seems the b_x estimates would have been the same in Models 1 and 2, because even though Model 1’s omitting Z_{resid} yielded a lower total estimated variance of latent \tilde{Y}, standardizing the b_x estimates to the square root of this variance would have accounted for the difference. But as we saw in the output above, the b_x estimates differ between Models 1 (0.30) and 2 (0.53). The implication is that the b_x estimates are not comparable between Models 1 and 2 as they are in linear regression. Only after \tilde{Y}-standardizing the b_x estimates will the values be the same.

To do so, take the square root of the total estimated variance of latent \tilde{Y} to compute the total estimated SD of latent \tilde{Y}. Use this SD to \tilde{Y}-standardize b_x, obtaining b_{x_{std}}.

total_sd_model1 <- sqrt(total_var_model1)

b_x_model1 <- fixef(model1)["x", "Estimate"]
b_x_std_model1 <- b_x_model1/total_sd_model1

Whereas the b_x estimates for Models 1 and 2 differ (0.30 and 0.53, respectively), after scaling them to the total estimated SD of latent \tilde{Y}, the resulting b_{x_{std}} estimates are the same (0.29).

round(b_x_model1, 2)     == 0.30
round(b_x_model2, 2)     == 0.53

round(b_x_std_model1, 2) == 0.29
round(b_x_std_model2, 2) == 0.29

Does this seem correct, or am I missing something? My initial interpretation of statements that the beta coefficients represented the number of SD units on the latent scale that are associated with a one-unit change in the predictor was that the coefficients were already \tilde{Y}-standardized, but this does not seem to be the case. Thank you in advance for your help!

Yes, coefficients in probit models can’t be compared across models or populations and are, in a sense, “biased” - they will be lower than expected if your model is missing any determinant of the dependent variable. This is in fact true for any model using logit or probit link function (including the very popular binary logistic regression). I think Mood’s paper you’ve linked to explains it very well. It’s also not a brms or Stan-specific issue, that’s just how the models work.

IMHO, the problem with standardizing the coefficients by default is that you are changing the units. If you were using logistic regression and standardized its coefficients, you can no longer say “unit change in x is associated with an XY increase in (log) odds”, you have to say “unit change in x is associated with an XY increase in standardized (log) odds”. And no one has an intuitive understanding of standardized (log) odds (or probits). Notice, for example, that your standardized coefficients, while comparable, are not the same as the coefficient used to simulate the data (0.29 vs 0.5).

My preferred solution is to report average marginal effects on the probability scale. There is an amazing package for R and Python called marginaleffects, which makes it easy. AMEs on the probability scale don’t suffer from noncollapsibility and they are in units most people are familiar with. The only disadvantage is if the relationship is highly nonlinear, AME may be a poor descriptor.

1 Like