How to calculate estimated error for monotonic effects?

Hi everyone, I need some help. I have a simple model and I want to be able to calculate the estimated error for monotonic coefficients. Here is my example:

income_options <- c("below_20", "20_to_40", "40_to_100", "greater_100")
income <- factor(sample(income_options, 100, TRUE),
                 levels = income_options, ordered = TRUE)
mean_ls <- c(30, 60, 70, 75)
ls <- mean_ls[income] + rnorm(100, sd = 7)
dat <- data.frame(income, ls)
fit1 <- brm(ls ~ 1 + mo(income), data = dat)
dat$pred <- fitted(fit1)

here is the posterior_summary:

                      Estimate  Est.Error          Q2.5        Q97.5
b_Intercept         30.3849000 1.33510065   27.77938194   33.0276504
bsp_moincome        14.1119454 0.60042426   12.95246803   15.2755253
sigma                6.0846812 0.45255326    5.25143077    7.0541269
simo_moincome1[1]    0.6873463 0.03251676    0.62288940    0.7524714
simo_moincome1[2]    0.2595453 0.03722957    0.18389726    0.3304227
simo_moincome1[3]    0.0531084 0.03246610    0.00360139    0.1249914
lprior              -8.0978209 0.10938685   -8.31065097   -7.8836692
lp__              -333.2370354 1.75089262 -337.45131061 -330.9131288

I use fitted function to get exact estimates but that’s not the point. When I execute the last line of code, it adds 4 columns to my dataframe, “Estimate”, “Est.Error”, “Q2.5” and “Q97.5”. For the category “below_20” Est.Error is 1.33510065, that is the error of the intercept. For other categories, the estimated error is something else, like 1.174854 for “greater_100” and I don’t know how it’s computed. Is there a mathematical formula to calculate estimated error for monotonic effects?

1 Like

The estimated error is the standard deviation of the posterior draws (ie. the sample from the posterior distribution of the monotonic effects).

Here is how to estimate the four summaries (Estimate, Est.Error, Q2.5 and Q97.5) for each monotonic effect in turn.

# Extract the posterior draws for the level "40_to_100"
posterior_draws <- posterior_linpred(
  fit,
  newdata = tibble(income = "40_to_100")
)
# Calculate the summaries.
c(
  Estimate = mean(posterior_draws),
  Est.Error = sd(posterior_draws),
  Q2.5 = quantile(posterior_draws, 0.025),
  Q97.5 = quantile(posterior_draws, 0.975)
)

It might also be interesting to know how to compute the monotic effects from the model parameters that appear in summary(fit). This is explained by Eq. (3) in this paper: https://osf.io/preprints/psyarxiv/9qkhj.

D <- 3 # number of income levels - 1

as_draws_df(fit) %>%
  # Rename the parameters as in Eq. (3)
  rename(
    b = bsp_moincome,
    zeta1 = `simo_moincome1[1]`,
    zeta2 = `simo_moincome1[2]`,
    zeta3 = `simo_moincome1[3]`
  ) %>%
  mutate(
    mo_effect0 = b_Intercept,
    mo_effect1 = b_Intercept + b * D * (zeta1),
    mo_effect2 = b_Intercept + b * D * (zeta1 + zeta2),
    mo_effect3 = b_Intercept + b * D * (zeta1 + zeta2 + zeta3)
  )
1 Like

Thank you for your reply! It helps me to understand how to calculate standard deviation for this case. Do you have any ideas how to calculate standard deviation for a case with more than one factor?

income_options <- c("below_20", "20_to_40", "40_to_100", "greater_100")
income <- factor(sample(income_options, 100, TRUE),
                 levels = income_options, ordered = TRUE)
mean_ls <- c(30, 60, 70, 75)
ls <- mean_ls[income] + rnorm(100, sd = 7)
dat <- data.frame(income, ls)
dat$age <- rnorm(100, mean = 40, sd = 10)

fit1 <- brm(ls ~ 0 + mo(income) + age, data = dat)
summary(fit1)
dat$pred <- fitted(fit1)

In this case, the total estimated error is some kind of combination of the age standard deviation and the income standard deviation. I am interested in how to calculate the standard deviation for the income coefficient given your second code, where you calculate the monotonic effect. It comes from the probability, base rate and number of groups - 1. I want to know, is there a way how to calculate the standard deviation for the monotonic effect?

Do you mean std. errors for mo_effect0, …, mo_effect3 (monotonic effects) in the pseudo code above?

Yes, exactly!

Once you add age in the model, the predicted outcome for the same income level changes with age; the standard error changes too, as you’ve observed. You could report the estimates at age = 0. This doesn’t seem meaningful though; it may be better to report a table of the predicted ls across the 4 income levels at, say, ages 20, 40 and 60. A plot would do nicely too.

And even though it’s helpful to know the formulas, with a complex model it’s better to avoid doing calculations by hand. You can use the brms::conditional_effects() function:

# The output is a plot.
conditional_effects(fit1,
  "income",
  conditions = list(age = c(20, 40, 60))
)

Another option is the marginaleffects package:

library("marginaleffects")

# The output is a plot in a different style.
plot_predictions(fit1,
  condition = list("income", age = c(20, 40, 60))
)

# By default the summary statistics are Estimate, Q2.5 and Q97.5.
preds <- predictions(
  fit1,
  newdata = datagrid(
    age = c(20, 40, 60),
    income = income_options
  )
)
preds

# But it's straightforward to compute the std. errors too from the posterior draws.
preds %>%
  posterior_draws() %>%
  group_by(age, income) %>%
  summarise(
    across(
      draw,
      list(
        Estimate = \(x) mean(x),
        Est.Error = \(x) sd(x),
        Q2.5 = \(x) quantile(x, 0.025),
        Q97.5 = \(x) quantile(x, 0.975)
      )
    )
  )

So whatever the quantity of interest, the approach is: generate posterior draws from the fitted model and calculate the mean, std. error, etc.

Thank you for the answer, but I feel like there is some misunderstanding here. What you’re describing is the standard deviation of a prediction, not the coefficient. In a simple linear model, when I get the summary of a fit, every factor comes with an estimate (mean), est.error (st.dev.) and the 95% confidence interval. Using the code you provided in your first answer I can calculate the coefficient for every level of income when it’s monotonic. I am not really interested in the standard deviation and 95% CI of the predicted result, but rather standard deviation of the coefficient. Does it make sense?

Yes, it does make sense. I apologize if my explanation wasn’t clear; I’ll give it another try with less code and more words.

Although you are not interested in the predictions themselves, I think it helps to start there. Here are the predictions I got for the 4 income categories at ages 20, 40, 60.

#>   age income      draw_Estimate draw_Est.Error draw_Q2.5 draw_Q97.5
#>    20 below_20             13.4          0.893      11.7       15.2
#>    20 20_to_40             47.0          1.99       43.1       50.8
#>    20 40_to_100            55.4          2.23       51.0       59.8
#>    20 greater_100          60.9          2.24       56.5       65.4
#>    40 below_20             26.9          1.79       23.4       30.4
#>    40 20_to_40             60.4          1.81       56.9       63.9
#>    40 40_to_100            68.9          2.00       64.9       72.6
#>    40 greater_100          74.3          2.10       70.3       78.4
#>    60 below_20             40.3          2.68       35.1       45.7
#>    60 20_to_40             73.9          2.05       70.0       77.9
#>    60 40_to_100            82.3          2.15       78.1       86.4
#>    60 greater_100          87.8          2.32       83.3       92.3

I can also estimate the difference between any two income levels at ages 20, 40, 60. (For brevity I compute the three comparisons between a lower income level and the next higher income level.)

#>   age comparison                value_Estimate value_Est.Error value_Q2.5 value_Q97.5
#>    20 20_to_40 - below_20              33.5             2.50     28.5          38.4
#>    20 40_to_100 - 20_to_40              8.46            2.65      3.49         13.7
#>    20 greater_100 - 40_to_100           5.43            2.69      0.772        11.1
#>    40 20_to_40 - below_20              33.5             2.50     28.5          38.4
#>    40 40_to_100 - 20_to_40              8.46            2.65      3.49         13.7
#>    40 greater_100 - 40_to_100           5.43            2.69      0.772        11.1
#>    60 20_to_40 - below_20              33.5             2.50     28.5          38.4
#>    60 40_to_100 - 20_to_40              8.46            2.65      3.49         13.7
#>    60 greater_100 - 40_to_100           5.43            2.69      0.772        11.1

Hopefully this makes it clear what’s constant in the model and what changes with age. If you want to report the (relative) difference between two persons/households in different income brackets, then this difference is constant with age; you can report the estimate and its standard error. If you want to report the outcome for a person/household in a specific income bracket, then this depends on age & you can’t ignore the age factor.

So it depends on what you mean by “coefficients”. The relative effect of income doesn’t change with age because there is no interaction between income and age in the model. However, in absolute terms (rather than as a comparison between two income levels), age and income together determine the expected outcome.