Choice of regularizing prior for the cumulative probit

I have a situation where I have a 7-item Likert scale, where each respondent answers 6 questions on the same subject (TAM).
Essentially the same setup as in this question: Partial pooling on ordinal response (cumulative probit)

Sample data:

df <- data.frame(
  id=rep(c("A", "B", "C", "D", "E", "F", "G"), each=6),
  gender=c(rep("male", 6*4), rep("female", 6*3)),
  name=rep(c("Q1", "Q2", "Q3", "Q4", "Q5", "Q6"), times=7),
  response=c(rep(6,6), # resp.1, male
             c(rep(7,4), rep(6,2)), # resp.2, male
             rep(7,6), # resp.3, male
             c(7, rep(6,4), 5), # resp.4, male
             rep(7, 6), #resp.5, female
             rep(6, 6), #resp.6, female
             c(7, 6, 6, rep(5,3)) # resp.7, female
             )) |>
  mutate(ordered=factor(response, levels=1:7, ordered=TRUE),
         revorder=factor(8-response, levels=1:7, ordered=TRUE))
df |> group_by(id, gender, ordered) |> tally()

I noted that brms sampled more efficiently (meaning, no divergent transitions) if I reversed the scales (note that my data set only contains 5, 6, and 7). This is why is use the reverse order ( revorder ) in my formulas below.

I make heavy use of @Solomon s utiltiy function to plot the posterior mean (taken from his excellent blog Notes on the Bayesian cumulative probit | A. Solomon Kurz ):

plot_M2_posterior_mean <- function(m, title, sample_mean) {
  m_male <- m$data |> filter(gender == 'male') |> sample_mean()  |> pull()
  m_female  <- m$data |> filter(gender == 'female') |> sample_mean() |> pull()
  # note, the magrittr pipe %>% is needed, because native |> pipe does not handle dot
  as_draws_df(m) %>%
    select(.draw, starts_with("b_")) %>%
    set_names(".draw", str_c("tau[", 1:6, "]"), "beta[1]") %>%
    # insert another copy of the data below
    bind_rows(., .) %>%
    # add the two values for the dummy variable gendermale
    mutate(gendermale = rep(0:1, each = n() / 2)) %>%
    # compute the p_k values conditional on the gender dummy
    mutate(p1 = pnorm(`tau[1]`, mean = 0 + gendermale * `beta[1]`),
           p2 = pnorm(`tau[2]`, mean = 0 + gendermale * `beta[1]`) - pnorm(`tau[1]`, mean = 0 + gendermale * `beta[1]`),
           p3 = pnorm(`tau[3]`, mean = 0 + gendermale * `beta[1]`) - pnorm(`tau[2]`, mean = 0 + gendermale * `beta[1]`),
           p4 = pnorm(`tau[4]`, mean = 0 + gendermale * `beta[1]`) - pnorm(`tau[3]`, mean = 0 + gendermale * `beta[1]`),
           p5 = pnorm(`tau[5]`, mean = 0 + gendermale * `beta[1]`) - pnorm(`tau[4]`, mean = 0 + gendermale * `beta[1]`),
           p6 = pnorm(`tau[6]`, mean = 0 + gendermale * `beta[1]`) - pnorm(`tau[5]`, mean = 0 + gendermale * `beta[1]`),
           p7 = 1 - pnorm(`tau[6]`, mean = 0 + gendermale * `beta[1]`)) %>%
    # wrangle
    pivot_longer(starts_with("p"), values_to = "p") %>%
    mutate(response = str_extract(name, "\\d") %>% as.double()) %>%
    # compute p_k * k
    mutate(`p * response` = p * response) %>% 
    # sum those values within each posterior draw, by the gendermale dummy
    group_by(.draw, gendermale) %>% 
    summarise(mean_response = sum(`p * response`)) %>% 
    mutate(gender = ifelse(gendermale == 0, "female", "male")) %>% 
    
    # the trick with and without fct_rev() helps order the axes, colors, and legend labels
    ggplot(aes(x = mean_response, y = fct_rev(gender), fill = gender)) +
    stat_halfeye(.width = .95) +
    geom_vline(xintercept = m_male, linetype = 2, color = "blue3") +
    geom_vline(xintercept = m_female, linetype = 2, color = "red3") +
    scale_fill_manual(NULL, values = c(alpha("red3", 0.5), alpha("blue3", 0.5))) +
    labs(title = paste0("The posterior for the mean of the rating values for ", title, ", by gender"),
         subtitle = paste("The dashed vertical lines mark off the sample means, by gender\nFormula:", m$formula),
         x = expression(mu[response]),
         y = NULL) +
    theme(axis.text.y = element_text(hjust = 0)) 
}

When I use the formula and priors:

formula <- revorder | thres(6) ~ 1 + gender + (1 | id)
priors1 <- c(prior(normal(-1.068, 1), class = Intercept, coef = 1),
            prior(normal(-0.566, 1), class = Intercept, coef = 2),
            prior(normal(-0.180, 1), class = Intercept, coef = 3),
            prior(normal( 0.180, 1), class = Intercept, coef = 4),
            prior(normal( 0.566, 1), class = Intercept, coef = 5),
            prior(normal( 1.068, 1), class = Intercept, coef = 6),
            prior(normal(0, 1), class = b),
            prior(exponential(1), class = sd))
M3 <- brm(
  data = df |> select(revorder, gender, id),
  family = cumulative(probit),
  formula=formula,
  prior = priors1,
  drop_unused_levels = F,
  backend="cmdstanr",
  cores = 4,
  seed = 1
)

I get the following summary:

 Family: cumulative 
  Links: mu = probit; disc = identity 
Formula: revorder | thres(6) ~ 1 + gender + (1 | id) 
   Data: select(df, revorder, gender, id) (Number of observations: 42) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~id (Number of levels: 7) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     2.88      1.00     1.34     5.19 1.00     1189     1698

Regression Coefficients:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]    -2.36      0.77    -3.92    -0.88 1.00     1819     2037
Intercept[2]    -0.31      0.72    -1.74     1.04 1.00     1979     2037
Intercept[3]     0.60      0.69    -0.78     1.94 1.00     2324     2697
Intercept[4]     0.94      0.69    -0.42     2.25 1.00     2533     2872
Intercept[5]     1.37      0.72    -0.04     2.82 1.00     2810     3447
Intercept[6]     2.07      0.85     0.49     3.81 1.00     3237     3237
gendermale      -0.07      0.88    -1.81     1.65 1.00     1842     2146

and the following posterior mean (based on the function above):

Note that the regularized mean is quite close to 3, and far away from the sample mean.

When I choose a different prior for the sd, such as the narrower exp(5), I get closer to the sample mean:

priors5 <- c(prior(normal(-1.068, 1), class = Intercept, coef = 1),
            prior(normal(-0.566, 1), class = Intercept, coef = 2),
            prior(normal(-0.180, 1), class = Intercept, coef = 3),
            prior(normal( 0.180, 1), class = Intercept, coef = 4),
            prior(normal( 0.566, 1), class = Intercept, coef = 5),
            prior(normal( 1.068, 1), class = Intercept, coef = 6),
            prior(normal(0, 1), class = b),
            prior(exponential(5), class = sd))
M3exp5 <- brm(
  data = df |> select(revorder, gender, id),
  family = cumulative(probit),
  formula=formula,
  prior = priors5,
  drop_unused_levels = F,
  backend="cmdstanr",
  cores = 4,
  seed = 1
)

Here, we are closer to the sample mean (and we could get even closer, by choosing an even narrower prior, e.g., exp(10)

The real a-ha moment came when I simulated 40 new respondents, that are extremely positive (all ranking 1 on the Likert)

This is of course easy to simulate by adding fake data:

fakedata <- rbind(df |> select(revorder, gender, id), 
                  data.frame(revorder=1, # rave reviews 
                             gender=rep(c("female", "male"), each=120),
                             id=rep(c("100", "101", "102", "103", "104", "105", "106", "107", "108", "109",
                                      "1100", "1101", "1102", "1103", "1104", "1105", "1106", "1107", "1108", "1109",
                                      "200", "201", "202", "203", "204", "205", "206", "207", "208", "209",
                                      "2100", "2101", "2102", "2103", "2104", "2105", "2106", "2107", "2108", "2109"), each=6) # 40 new respondents
                             )
                  )
fakeM3exp1 <- brm(
  data = fakedata,
  family = cumulative(probit),
  formula=formula,
  prior = priors1,
  drop_unused_levels = F,
  backend="cmdstanr",
  cores = 4,
  seed = 1
)

Plotting the posterior mean shows that the expected value is still around 3:

Using the exp(5) (or, even more so, exp(10) priors make the data align considerably better to the sample mean:

The reason for this behaviour is presumably because of the “long tail” of the exp(1) distribution, and the fact that the probit has a relatively narrow span.

But how do I know which prior to choose? I find choosing priors that influence standard deviations much harder to intuit than others (being the newbie I still am, unfortunately…)
In this case, I know, by design, that no matter how many respondents ( id in the data) I get, each will only generate 6 answers. I suspect this is the reason the exp(1) prior fails (it is too strong, dragging the posterior mean too much towards the center of the scale)

Although I only tried this with the cumulative probit, I suspect the similar issue will arise with the logit.

EDIT: Found another indication that the exp(1) prior is indeed too strong for this use case (remember, every respondent, which we pool on, will only ever give 6 responses)

I decided to try out the varying variance example from @Solomon also.
Though only as a fixed effect, so my model became this:

formula <- bf(revorder | thres(6) ~ 1 + gender + (1 | id)) +
           lf(disc                ~ 0 + gender, cmc = FALSE)
priors1 <- c(prior(normal(-1.068, 1), class = Intercept, coef = 1),
            prior(normal(-0.566, 1), class = Intercept, coef = 2),
            prior(normal(-0.180, 1), class = Intercept, coef = 3),
            prior(normal( 0.180, 1), class = Intercept, coef = 4),
            prior(normal( 0.566, 1), class = Intercept, coef = 5),
            prior(normal( 1.068, 1), class = Intercept, coef = 6),
            prior(normal(0, 1), class = b),
            prior(normal(0, log(2)/2), class = b, dpar=disc),
            prior(exponential(1), class = sd)
            )
M4_ppc <- brm(
  data = df,
  family = cumulative(probit),
  formula=formula,
  prior = priors1,
  drop_unused_levels = F,
  sample_prior = "only",
  backend="cmdstanr",
  cores = 4,
  seed = 1,
  control = list(adapt_delta=0.99)
)

PPC looks very good:

Based on my understanding on pnorm and disc, I modified the plot function to incorporate the varying sd:

plot_M4_posterior_mean <- function(m, title, sample_mean) {
  m_male <- m$data |> filter(gender == 'male') |> sample_mean()  |> pull()
  m_female  <- m$data |> filter(gender == 'female') |> sample_mean() |> pull()
  # note, the magrittr pipe %>% is needed, because native |> pipe does not handle dot
  as_draws_df(m) %>%
    select(.draw, starts_with("b_")) %>%
    set_names(".draw", str_c("tau[", 1:6, "]"), "beta[1]", "b_disc[1]") %>%
    # insert another copy of the data below
    bind_rows(., .) %>%
    # add the two values for the dummy variable gendermale
    mutate(gendermale = rep(0:1, each = n() / 2)) %>%
    # compute the p_k values conditional on the gender dummy
    mutate(p1 = pnorm(`tau[1]`, mean = 0 + gendermale * `beta[1]`, sd = 1/exp(0 + gendermale * `b_disc[1]`)),
           p2 = pnorm(`tau[2]`, mean = 0 + gendermale * `beta[1]`, sd = 1/exp(0 + gendermale * `b_disc[1]`)) - pnorm(`tau[1]`, mean = 0 + gendermale * `beta[1]`, sd = 1/exp(0 + gendermale * `b_disc[1]`)),
           p3 = pnorm(`tau[3]`, mean = 0 + gendermale * `beta[1]`, sd = 1/exp(0 + gendermale * `b_disc[1]`)) - pnorm(`tau[2]`, mean = 0 + gendermale * `beta[1]`, sd = 1/exp(0 + gendermale * `b_disc[1]`)),
           p4 = pnorm(`tau[4]`, mean = 0 + gendermale * `beta[1]`, sd = 1/exp(0 + gendermale * `b_disc[1]`)) - pnorm(`tau[3]`, mean = 0 + gendermale * `beta[1]`, sd = 1/exp(0 + gendermale * `b_disc[1]`)),
           p5 = pnorm(`tau[5]`, mean = 0 + gendermale * `beta[1]`, sd = 1/exp(0 + gendermale * `b_disc[1]`)) - pnorm(`tau[4]`, mean = 0 + gendermale * `beta[1]`, sd = 1/exp(0 + gendermale * `b_disc[1]`)),
           p6 = pnorm(`tau[6]`, mean = 0 + gendermale * `beta[1]`, sd = 1/exp(0 + gendermale * `b_disc[1]`)) - pnorm(`tau[5]`, mean = 0 + gendermale * `beta[1]`, sd = 1/exp(0 + gendermale * `b_disc[1]`)),
           p7 = 1 - pnorm(`tau[6]`, mean = 0 + gendermale * `beta[1]`, sd = 1/exp(0 + gendermale * `b_disc[1]`))) %>%
    # wrangle
    pivot_longer(starts_with("p"), values_to = "p") %>%
    mutate(response = str_extract(name, "\\d") %>% as.double()) %>%
    # compute p_k * k
    mutate(`p * response` = p * response) %>% 
    # sum those values within each posterior draw, by the gendermale dummy
    group_by(.draw, gendermale) %>% 
    summarise(mean_response = sum(`p * response`)) %>% 
    mutate(gender = ifelse(gendermale == 0, "female", "male")) %>% 
    
    # the trick with and without fct_rev() helps order the axes, colors, and legend labels
    ggplot(aes(x = mean_response, y = fct_rev(gender), fill = gender)) +
    stat_halfeye(.width = .95) +
    geom_vline(xintercept = m_female, linetype = 2, color = "red3") +
    geom_vline(xintercept = m_male, linetype = 2, color = "blue3") +
    scale_fill_manual(NULL, values = c(alpha("red3", 0.5), alpha("blue3", 0.5))) +
    labs(title = paste0("The posterior for the mean of the rating values for ", title, ", by gender"),
         subtitle = paste("The dashed vertical lines mark off the sample means, by gender\nFormula:", m$formula),
         x = expression(mu[response]),
         y = NULL) +
    theme_bw() 
}

Running and plotting this model (with the small data set from the top of this post), works OK, on the surface:

M4 <- brm(
  data = df,
  family = cumulative(probit),
  formula=formula,
  prior = priors1,
  drop_unused_levels = F,
  backend="cmdstanr",
  cores = 4,
  seed = 1,
  control = list(adapt_delta=0.99)
)

But the killer really comes when I wanted to check what happens when I get 40 rave females (all giving 1 in rating), so let’s create some fake data and see how the model reacts:

fakedata <- rbind(df |> select(revorder, gender, id), 
                  data.frame(revorder=1, # rave reviews 
                             gender=rep(c("female", "female"), each=120),
                             id=rep(c("100", "101", "102", "103", "104", "105", "106", "107", "108", "109",
                                      "1100", "1101", "1102", "1103", "1104", "1105", "1106", "1107", "1108", "1109",
                                      "200", "201", "202", "203", "204", "205", "206", "207", "208", "209",
                                      "2100", "2101", "2102", "2103", "2104", "2105", "2106", "2107", "2108", "2109"), each=6) # 40 new respondents
                             )
                  )

Because I only affect one of the categories (and very much so), we would expect that category to “narrow down”, and shrink towards the category mean.
But instead, the model just goes wild:

If I instead choose the narrower exp(5) prior, I get the desired behaviour (of sorts):

priors5 <- c(prior(normal(-1.068, 1), class = Intercept, coef = 1),
            prior(normal(-0.566, 1), class = Intercept, coef = 2),
            prior(normal(-0.180, 1), class = Intercept, coef = 3),
            prior(normal( 0.180, 1), class = Intercept, coef = 4),
            prior(normal( 0.566, 1), class = Intercept, coef = 5),
            prior(normal( 1.068, 1), class = Intercept, coef = 6),
            prior(normal(0, 1), class = b),
            prior(normal(0, log(2)/2), class = b, dpar=disc),
            prior(exponential(5), class = sd)
            )

This, to me, is evidence that unlike “just beta priors”, when you deal with choosing sd priors (and I strongly suspect also cor priors), you’d be better off by actually testing your priors with a couple of “fake data sets”, to see what effects your priors have on your model when faced with synthetic (and therefore predictable) data.

Also @Solomon , I would highly appreciate if you could review my adaptation of your plot function for the posterior mean ( plot_M4_posterior_mean ). I am not totally at home in R yet, and I have never worked with disc before.

Please also provide the following information in addition to your question:

  • Operating System: 22.04.1-Ubuntu
  • brms Version: Package brms version 2.22.0

Aren’t your intercept priors allowing the intercepts to be out of order?

I’m still quite new to Stan and Bayesian modeling in general, but I was following the pattern laid out in a blog post by @Solomon : Notes on the Bayesian cumulative probit | A. Solomon Kurz

Unfortunately, brms does not yet support a more proper Dirichlet prior on the cutpoints:

And I have not yet gotten this workaround to work, so I’m resorting to the separated normal distributions:

Most of my data sets sample correctly, even in the face of outliers (where I had to resort to reloo to figure out that one problematic estimated Pareto-k-value was in fact not an issue). But this is conditional on me using exp(5) for the sd.
I have given up on exp(1).

I would suggest that you expand your data simulation so that it matches your model rather than being hard-coded. That should help you identify where the model estimates are deviating from your expectations beyond just the mean.

A few things that stick out to me. Your priors on the thresholds are centered such that a latent value of 0 would correspond to an observed value of 4. Those priors are probably driving the behavior you’re seeing, whereas the choice of prior on level 2 intercepts is just trying to compromise for that rigidity.

Also, you’re estimating a hierarchical model with only 7 level 2 units. 4 of the 7 have no variance in observations. This probably explains the strong influence of the priors.

1 Like

You might try the rmsb package’s blrm function which is dedicated to ordinal logistic models and uses the Dirichlet prior for the multinomial probabilities induced by the intercepts. This achieves ordering of intercepts automatically and speeds sampling.