Brms iformative prior for bionomial regression

brophyj · August 6, 2023, 10:18pm

I’m doing a simple beta-binomial model ands it works well with the default priors but I can’t figure out the syntax for informative priors.
I have a simple 2 arm randomized trial with 14 successes in 70 patients compared to 10 in 62 in the control arm.

The following code works well

pacman::p_load(brms)
data_bin2 <- data.frame(N = c(62,70), y = c(10,14), grp2 = as.factor(c(0,1)))
f = bf(y | trials(N) ~ 0 + grp2)

get_prior(formula = f,
          data = data_bin2,
          family = binomial(link = "identity"))
m <- brm(
  formula = f,
  data = data_bin2,
  family = binomial(link = "logit"),
  chains = 4, warmup = 1000, iter = 2000, seed = 123,
  refresh = 0
)

But if I try to include an informative prior of 25 in 122 for control group 0 and 44 in 94 for treated group

m2 <- brm(
  formula = f,
  data = data_bin2,
  family = binomial(link = "logit"),
  prior = c(prior(beta(25,122), class = b, coef = "grp20"),
            prior(beta(44,94),class = b, coef = "grp21")),
  chains = 4, warmup = 1000, iter = 2000, seed = 123,
  refresh = 0
)

I get error messages saying I need the lb, ub arguments but when I include them I get an error saying this doesn’t work with the coef argument.
Suggestions?

Operating System: Mac Ventura 13.4.1
brms Version: 2.19.0

Solomon · August 7, 2023, 2:05pm

Do you want to use the logit link, or would you prefer to use the identity link?

brophyj · August 7, 2023, 3:17pm

Ideally both. While I prefer the risk differences from the identity link, the convention in cardiology is to report multiplicative effects (logistic).
En passant, I have really enjoyed your publications adapting books to the R (tidyverse) universe.

Solomon · August 7, 2023, 3:50pm

Thanks for your kind words.

So, at the moment your m2 model has the parameters on the log-odds scale, but it looks like you’re trying to specify the priors on the identity scale. You’ll need to choose one or the other. If you want to keep the model on the log-odds scale, I recommend switching out your beta priors to priors using unbounded continuous distributions, such as with the normal distribution. If you instead strongly prefer to express your priors with the beta distribution, I recommend switching to the identity link for the model.

brophyj · August 7, 2023, 4:41pm

That is helpful realizing I’m mixing model ad prior forms. But the devil is in the details regarding how to correct this error.
For the identity link, I tired

m3 <- brm(
  formula = f,
  data = data_bin2,
  family = binomial(link = "identity"),
  prior = c(prior(beta(25,122), class = b, coef = "grp20"),
            prior(beta(44,94),class = b, coef = "grp21")),
  chains = 4, warmup = 1000, iter = 2000, seed = 123,
  refresh = 0
)

but this generated the following error

Error: The following priors do not correspond to any model parameter: 
b_grp20 ~ beta(25, 122)
b_grp21 ~ beta(44, 94)
Function 'get_prior' might be helpful to you.

get_prior() doesn’t help since the model hasn’t been created.

For the logistic link I’m really quite lost at how to proceed. In a vary convoluted manner, I have used the prior data to give a prior for an intercept (the above models didn’t use an intercept) and then used the default prior for the treatment.

combined_logodds <- log((25/122)/(44/94))
combined_logodds_sd <- sqrt( (1/25) + (1/122) + (1/44) + (1/94)) 

m4 <- stan_glm(y/N ~ grp2, family = binomial(), data = data_bin2, # iter=26000, chains=4, warmup=1000,
                             prior_intercept = normal(combined_logodds, combined_logodds_sd),
                             weights = N, seed = 123, refresh = 0)

This actually runs but I’m really unsure what exactly I’m getting and feel there must be a more direct approach.
Thanks again for your help.

Solomon · August 7, 2023, 4:54pm

For your m3, execute this:

get_prior(
  formula = f,
  data = data_bin2,
  family = binomial(link = "identity")
)

brophyj · August 7, 2023, 5:29pm

get_prior(
  formula = f,
  data = data_bin2,
  family = binomial(link = "identity")
)
 prior class coef group resp dpar nlpar lb ub       source
 (flat)     b                                       default
 (flat)     b grp2                             (vectorized)

which led me to try

m3 <- brm(
  formula = f,
  data = data_bin2,
  family = binomial(link = "identity"),
  prior = c(prior(beta(25,122), class = b),
            prior(beta(44,94),class = b, coef = "grp2")),
  chains = 4, warmup = 1000, iter = 2000, seed = 123,
  refresh = 0
)

which generated a bunch of errors, my basic problem remains how to put an indformative prior on each of the two possible treatments.

also any thoughts about the logistic model

sorry to take up your time but I really do appreciate your help.

Solomon · August 7, 2023, 5:41pm

Following some of your code from earlier, this works perfectly on my computer:

data_bin2 <- data.frame(N = c(62,70), y = c(10,14), grp2 = as.factor(c(0,1)))

f = bf(y | trials(N) ~ 0 + grp2)

m3 <- brm(
  formula = f,
  data = data_bin2,
  family = binomial(link = "identity"),
  prior = c(prior(beta(25,122), class = b, coef = "grp20"),
            prior(beta(44,94),class = b, coef = "grp21")),
  chains = 4, warmup = 1000, iter = 2000, seed = 123,
  refresh = 0
)

print(m3)

 Family: binomial 
  Links: mu = identity 
Formula: y | trials(N) ~ 0 + grp2 
   Data: data_bin2 (Number of observations: 2) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
grp20     0.17      0.03     0.12     0.22 1.00     3805     2869
grp21     0.28      0.03     0.22     0.34 1.00     3863     2734

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

brophyj · August 7, 2023, 5:51pm

That is so weird. I ran that same code earlier and got

m3 <- brm(
+   formula = f,
+   data = data_bin2,
+   family = binomial(link = "identity"),
+   prior = c(prior(beta(25,122), class = b, coef = "grp20"),
+             prior(beta(44,94),class = b, coef = "grp21")),
+   chains = 4, warmup = 1000, iter = 2000, seed = 123,
+   refresh = 0
+ )
Error: The following priors do not correspond to any model parameter: 
b_grp20 ~ beta(25, 122)
b_grp21 ~ beta(44, 94)
Function 'get_prior' might be helpful to you.

When I re-run it now it works and I get the same results you do. It does come with a long list of warnings but I guess i can safely(?) ignore them.

Thoughts on the syntax for the priors for the logistic model?

Thanks

Solomon · August 7, 2023, 6:08pm

My bet is the warnings you got had to do with the lack of lower and upper boundaries for the priors. If you got no other important warnings and the print() output returned good-looking results by way of the chain diagnostics, I wouldn’t worry much. However, here’s a way to set lower and upper boundaries for your priors:

m3 <- brm(
  formula = f,
  data = data_bin2,
  family = binomial(link = "identity"),
  prior = c(prior(beta(1, 1), class = b, lb = 0, ub = 1),
            prior(beta(25,122), class = b, coef = "grp20"),
            prior(beta(44,94),class = b, coef = "grp21")),
  chains = 4, warmup = 1000, iter = 2000, seed = 123,
  refresh = 0
)

The difficulty is you cannot use the lb and ub arguments within the prior() function when you’re using the coef argument. So the work-around is to set a global prior of class = b that excludes the coef argument, but includes the lb and ub arguments. You’ll probably get this warning:

Warning: The global prior 'beta(1, 1)' of class 'b' will not be used in the model as all related coefficients have individual priors already. If you did not set those priors yourself, then maybe brms has assigned default priors.

If so, it’s okay. The beta(1, 1) part won’t actually get applied to any of your model parameters, BUT the lower and upper boundaries will get extended to the other two priors. You can confirm that by executing this:

m3$prior

which should return this:

         prior class  coef group resp dpar nlpar lb ub source
    beta(1, 1)     b                              0  1   user
 beta(25, 122)     b grp20                        0  1   user
  beta(44, 94)     b grp21                        0  1   user

This confirms the boundaries were extended properly to your other priors.

brophyj · August 7, 2023, 6:47pm

Cool, that works without all the warnings!
Thanks

So I should probably call it a day but I would also like to understand how to use the informative priors in the logit model in bms
Using rstanarm the following works

combined_logodds <- log((25/122)/(44/94))
combined_logodds_sd <- sqrt( (1/25) + (1/122) + (1/44) + (1/94)) 

m4 <- stan_glm(y/N ~ grp2, family = binomial(), data = data_bin2, # iter=26000, chains=4, warmup=1000,
                             prior_intercept = normal(combined_logodds, combined_logodds_sd),
                             weights = N, seed = 123, refresh = 0)

but wonder if this is correct and really want to do this with brms.

Suggestions?

brophyj · August 8, 2023, 1:34pm

While I have got this working using an identity link (+/- informative priors), and can make it work with a logistic link with default priors

pacman::p_load(brms, tidyverse, tidybayes)
data_bin2 <- data.frame(N = c(62,70), y = c(10,14), grp2 = as.factor(c("cCPR","eCPR"))) 
f = bf(y | trials(N) ~ 0 + grp2)

get_prior(formula = f,
          data = data_bin2,
          family = binomial(link = "logit"))


m_logit <- brm(
  formula = f,
  data = data_bin2,
  family = binomial(link = "logit"),
  chains = 4, warmup = 1000, iter = 2000, seed = 123,
  refresh = 0
)

prior_summary(m_logit)
print(m_logit)

I’m still stuck on how to use the logit link with informative priors. As a recall, my priors are 25 successes in 122 trials for grp20 and 44 in 94 for grp21. I get that this prior data needs to entered on a logit scale but don’t know

how to do this
what is the brm syntax to enter these priors once #1 has been done

Solomon · August 9, 2023, 3:30pm

I’ll answer this question in two parts. In the first part, I’ll answer your basic question directly. In the second part, I’ll suggest an alternative workflow.

Part 1: How to do this

I’m not aware of a simple way to convert beta priors to normal priors on the log-odds scale (though see this link for some ggdist wizardry). But a simple iterative way to get there is to:

a) plot the beta priors of interest and compute their basic descriptive statistics,
b) simulate a large number of draws from some normal distribution,
c) convert those normal draws to the log-odds scale with the plogis() function,
d) plot those transformed draws,
e) take the basic descriptive statistics from those transformed draws,
f) compare the results of your simulation with their target beta distribution, and finally
h) iterate by adjusting the mean and/or sd arguments in rnorm() until the results are what you want.

Here’s what this looks like in practice.

a) plot the beta priors and compute sample stats.

Taking your beta(25, 122) prior as an example, here’s how to plot the prior with helper functions from ggdist.

# load packages
library(tidyverse)
library(brms)
library(ggdist)

# plot
prior(beta(25, 122)) %>% 
  parse_dist() %>% 

  ggplot(aes(xdist = .dist_obj)) +
  stat_halfeye(point_interval = mean_qi, .width = .95) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(title = "Beta(25, 122)")

Looks great, doesn’t it? I’m a huge fan of that ggdist and its handy parse_dist() and stat_halfeye() functions. Here’s how to compute the population mean and variance of that beta prior using its two parameters.

# beta parameters
a <- 25
b <- 122

# mean
a / (a + b)

# variance
a * b / ((a + b)^2 * (a + b + 1))

[1] 0.170068
[1] 0.0009536817

That overall shape and those descriptive statistics are our targets. [You could focus on other sample statistics, like quantiles or whatever. Means and variances are just a fine place to start.]

b) simulate a large number of normal draws.

The reason we’re simulating draws from the normal distribution is because it’s a good default distribution for \beta priors in logistic regression models. You could use some other distribution, like a small-\nu Student-t distribution, if you wanted. But anyway, pick some distribution and start simulating. Since we’ve chosen the normal distribution, we simulate with rnorm(). Note the values I’ve set for the mean and sd arguments. Pick different ones and see what happens

# you want some large number of draws
n_draw <- 1e6

# set your seed to make it reproducible
set.seed(1)

# simulate
my_simulation <- tibble(n = rnorm(n = n_draw, mean = -1.5, sd = 0.25))

# what have we done?
glimpse(my_simulation)

Rows: 1,000,000
Columns: 1
$ n <dbl> -1.656613, -1.454089, -1.708907, -1.101180, -1.417623, -1.705117, -1.378143, -1.315419, -1.356055, -1.…

Now we have 1{,}000{,}000 draws to play with.

c) convert those draws with `plogis()` .

Next we use the plogis() function to put those draws on the log-odds scale. We’ll save the results in a new column called p.

my_simulation <- my_simulation %>%
  mutate(p = plogis(n)) 

# what have we done?
glimpse(my_simulation)

Rows: 1,000,000
Columns: 2
$ n <dbl> -1.656613, -1.454089, -1.708907, -1.101180, -1.417623, -1.705117, -1.378143, -1.315419, -1.356055, -1.…
$ p <dbl> 0.1602171, 0.1893730, 0.1533055, 0.2495189, 0.1950345, 0.1537981, 0.2013074, 0.2115815, 0.2048823, 0.1…

The n column has the values on their original scale, and the p column has the values after transforming them with plogis().

d) plot those transformed draws.

Here’s the plot.

my_simulation %>% 
  ggplot(aes(x = p)) +
  stat_halfeye(point_interval = mean_qi, .width = .95) +
  scale_y_continuous(NULL, breaks = NULL) +
  coord_cartesian(xlim = c(0, 1)) +
  labs(title = expression(logit^{-1}*"[Normal(-1.5, 0.25)]"))

e) more descriptive statistics.

Now compute the sample mean and variance for our transformed simulated draws.

my_simulation %>% 
  summarise(mean = mean(p),     # the target mean is 0.170068
            variance = var(p))  # the target variance is 0.0009536817

# A tibble: 1 × 2
   mean variance
  <dbl>    <dbl>
1 0.185  0.00142

f) compare the results.

If you compare the plot and sample statistics of our simulation, you’ll see they’re pretty close to their beta targets, not perfectly there. If this is close enough for you, move forward with the model and your new priors. But…

h) iterate.

If you’d like to get a little closer to the target beta shape and descriptive statistics, try changing the mean and/or sd arguments within rnorm() way back in step b) and just keep iterating until you’re satisfied. Note, there’s no guarantee you’re going to find the normal distribution that perfectly replicates the original beta distribution after that plogis() transformation. So it goes…

Part 2: Should one do this?

Maybe. I do stuff like this sometimes, even as recently as last week. However, I’m not so sure this is actually the best method for your use case. If your goal is just to learn how to work with brm() models, this is a great practice. But if your goal is to analyze real-world data with a scientifically justified model for peer review, I would not try to fit the model two ways, one with the identity link and the other with the conventional logit link. Rather, I would just fit the model with the priors you believe in, which appears to be the one with the identity link. If your colleagues prefer odds ratios, you can just work with the posterior draws to transform the posterior probabilities into an odds ratio metric.

Let’s say your identity-link model is still called m3. Here’s how to pull the posterior draws with as_draws_df(). We’ll save the results as a data frame named draws.

draws <- as_draws_df(m3)

# what?
head(draws)

# A draws_df: 6 iterations, 1 chains, and 4 variables
  b_grp20 b_grp21 lprior lp__
1    0.16    0.33    4.8 -2.1
2    0.16    0.22    1.3 -2.9
3    0.16    0.35    4.6 -3.1
4    0.15    0.36    4.3 -3.9
5    0.19    0.31    4.5 -2.2
6    0.16    0.34    4.7 -2.6
# ... hidden reserved variables {'.chain', '.iteration', '.draw'}

The head() function returned the first 6 rows, but the full draws data frame has 4,000 rows by default. The posterior draws for the probabilities of your two groups are in the b_grp20 and b_grp21 columns. In the next step, we’ll give them simpler names, p0 and p1, drop the unneeded columns, and hand compute the odds ratio.

draws <- draws %>% 
  # rename and drop the unneeded columns
  transmute(p0 = b_grp20,
            p1 = b_grp21) %>% 
  # compute the OR
  mutate(or = (p1 / (1 - p1)) / (p0 / (1 - p0)))

# what have we done?
head(draws)

# A tibble: 6 × 3
     p0    p1    or
  <dbl> <dbl> <dbl>
1 0.163 0.326  2.48
2 0.160 0.218  1.46
3 0.164 0.346  2.68
4 0.151 0.356  3.09
5 0.195 0.314  1.89
6 0.161 0.338  2.66

Now you can summarize or display your odds ratio as desired. For example, here it is in a plot.

draws %>% 
  ggplot(aes(x = or)) +
  stat_halfeye(point_interval = mean_qi, .width = .95) +
  scale_y_continuous(NULL, breaks = NULL) +
  xlab("My odds ratio")

No second model needed. All you need to know is how to convert your posterior probabilities to the metric your colleagues prefer.

jsocolar · August 9, 2023, 4:29pm

Just for completeness, you could put the beta prior on the logit and add the appropriate Jacobian adjustment which in this case should just be the logistic(0, 1) probability density function I think. In brms you could do this by passing the logistic(0, 1) part via the prior argument and then incrementing the beta part via stanvars. But I don’t think this is actually a useful nudge for the problem that @brophyj is working through.

Solomon · August 9, 2023, 4:37pm

I like completeness. @jsocolar, would you mind showing exactly how to do this in code? I don’t think I could follow your instructions as is.

jsocolar · August 9, 2023, 5:01pm

Sure! Suppose you want a beta prior on the identity-scale intercept, but your link function is a logit. Then in your call to brm, assuming that you are using 0 + Intercept at the beginning of your model formula, you would add

set_prior('logistic(0, 1)', class = 'b', coef = 'Intercept') to the argument to prior. This is the Jacobian adjustment.

Then you’d include the stanvars argument with

stanvars =  stanvar(
  scode = "target += beta_lpdf(inv_logit(b[1]) | a, b);",
  block = "model"
)

where a and b are the desired parameters of the beta distribution you wish to use as a prior. b[1] is the internal variable name for the intercept given that the formula begins with 0 + Intercept.

I haven’t tested this and I don’t guarantee it works, but something along these lines should be right.

jsocolar · August 9, 2023, 5:11pm

To be clear, this isn’t a normal prior, it is the exact beta prior, but expressed on the log-odds scale such that it yields a beta pushforward on the identity scale.

Solomon · August 9, 2023, 6:21pm

🤯

Solomon · August 9, 2023, 6:45pm

I can confirm this works. Here’s how I pulled it off:

# save your stanvar() code as an external object for clarity
stanvars <- stanvar(
  scode = "target += beta_lpdf(inv_logit(b[1]) | 25, 122);",
  block = "model") +
  stanvar(
    scode = "target += beta_lpdf(inv_logit(b[2]) | 44, 94);",
    block = "model")

# fit the model
m3b <- brm(
  formula = f,
  data = data_bin2,
  # by default, the binomial family will use the logit link
  family = binomial(),
  # note this line
  prior = set_prior('logistic(0, 1)', class = 'b'),
  # this is where we insert our stanvars object
  stanvars = stanvars,
  chains = 4, warmup = 1000, iter = 2000, seed = 123
)

jd_c · August 9, 2023, 7:15pm

Cool!! I did not realize one could do this in brms.

Topic		Replies	Views
Prior specification for beta distribution in brms brms	20	5688	February 13, 2020
Scale of prior for non-linear beta regression brms fitting-issues	12	1613	November 3, 2020
Priors on mixing distributions - probability vs log-odds Modeling brms	1	796	May 24, 2022
Logit transformation of parameters in non-linear binomial model Modeling specification , brms	14	1353	September 5, 2023
Priors for a novice General	27	3905	October 25, 2019