Choosing & specifying priors, and weighting?

Hi everyone,

I’m modelling the effects of protective characteristics on a deadly disease. I’m investigating the questions of whether characteristic 1 and characteristic 2 lower the risk of developing the disease altogether, whether they delay the disease’s onset and whether they affect number of months survived after diagnosis. In approximately 5-10% of cases, this disease has a genetic cause.

My questions are these: if I expect a characteristic to have an inverse relationship with the outcome and I want the majority of the prior to be negative, is normal(-1,1) or cauchy(-1,1) the right way to go to model my expectation of a negative effect? I ran my models this way, thinking it was correct, but when I produced figures comparing my priors and posteriors, the priors are still centered on 0.

We are using HC as the refcat, so we’re placing priors on the ALS patients. Here are my thoughts for this model:

  1. CR: based on my previous work on CR’s effect, and what we know from Alzheimer’s disease, I expect
    that higher CR is associated with a lower risk of developing ALS, consequently, I would set a slightly
    negative prior (-1), with a standard deviation (SD) of 1;

  2. PAD: based on our previous work (Hermann et al., 2022), I would expect a higher PAD to be associated with a higher risk of developing ALS, and I will integrate this expectation into the model in the form of a positive prior with a large SD (centred at 1, SD of 2);

  3. Age: I do not expect a major effect of age in this sample, so I center the prior on zero, with an SD of

1;

  1. Sex: We know that men have a higher risk of developing ALS, so I will set a prior reflecting this.

This is the code for the logistic regression of developing the disease, including the code to produce the graphs:

Model1_priors <- brm(data=df,
family = bernoulli("logit"),
ALSvsHC~1+T1_PAD+CognitiveReserve+T1_chronAge+Sex,
prior=c(set_prior("cauchy(-1,1)", class="b", coef="CognitiveReserve"),
set_prior("normal(1,2)", class="b", coef="T1_PAD"),
set_prior("normal(0,1)", class="b", coef="T1_chronAge"),
set_prior("normal(1,0.5)", class="b", coef="Sexmale")),
sample_prior = "only", #sampling the prior to compare prior to posterior distribution
iter=10000,
chains=4,
save_pars = save_pars(all = TRUE)) #if we wish to calculate the BF, we must save all parameters
Model1_posteriors <- brm(data=df,
family = bernoulli("logit"),
ALSvsHC~1+T1_PAD+CognitiveReserve+T1_chronAge+Sex,
prior=c(set_prior("normal(-1,1)", class="b", coef="CognitiveReserve"),
set_prior("normal(1,2)", class="b", coef="T1_PAD"),
set_prior("normal(0,1)", class="b", coef="T1_chronAge"),
3
set_prior("normal(1,0.5)", class="b", coef="Sexmale")),
iter=10000,
chains=4,
save_pars = save_pars(all = TRUE))

cmp <- list(
"Prior" = avg_comparisons(Model1_priors),
"Posterior" = avg_comparisons(Model1_posteriors))
draws <- lapply(names(cmp), \(x) transform(posteriordraws(cmp[[x]]), Label = x))
draws <- do.call("rbind", draws)
ggplot(draws, aes(x = draw, color = Label)) +
xlim(c(-1, 1)) +
geom_density() +
facet_wrap(~term + contrast, scales = "free")

Am I thinking about this correctly? Should I make any changes? And since genetics affect the probability of the disease, should I add a variable reflecting the information of “was tested, and positive for gene 1”, “was tested, and positive for gene 2”, “was tested and negative” and “was not tested”? If so, how could I best incorporate this information in my models?

Thank you in advance everyone!

Concerning:

if I expect a characteristic to have an inverse relationship with the outcome and I want the majority of the prior to be negative, is normal(-1,1) or cauchy(-1,1) the right way to go to model my expectation of a negative effect? I ran my models this way, thinking it was correct, but when I produced figures comparing my priors and posteriors, the priors are still centered on 0.

your prior seems correctly specified. Are you sure they are really centered on 0 in the prior_only fit? The cauchy distribution has very long tails, so it’s difficult to see where exactly it is centered on plots. E.g. if you just run in base R:

hist(rcauchy(10000,-1,1), breaks=60)

you get a plot that looks like this:

Check the estimates for Model1_priors, they should reflect the center more precisely.

I also notice that your prior is inconsistent between the Model1_priors and Model1_posteriors:

set_prior("cauchy(-1,1)", class="b", coef="CognitiveReserve")
set_prior("normal(-1,1)", class="b", coef="CognitiveReserve")

also you have a random 3 in the prior for the posterior model:

prior=c(set_prior("normal(-1,1)", class="b", coef="CognitiveReserve"),
set_prior("normal(1,2)", class="b", coef="T1_PAD"),
set_prior("normal(0,1)", class="b", coef="T1_chronAge"),
3  # ????? this should give an error 'unexpected symbol in: ...'
set_prior("normal(1,0.5)", class="b", coef="Sexmale"))

You can always double check if your priors are as expected, by making them more extreme just for testing, e.g.

prior=c(set_prior("cauchy(-1,0.01)", class="b", coef="CognitiveReserve"),
set_prior("normal(1,0.01)", class="b", coef="T1_PAD"),
set_prior("normal(0,0.01)", class="b", coef="T1_chronAge"),
set_prior("normal(1,0.01)", class="b", coef="Sexmale"))

or

prior=c(set_prior("cauchy(-100,1)", class="b", coef="CognitiveReserve"),
set_prior("normal(100,2)", class="b", coef="T1_PAD"),
set_prior("normal(0,1)", class="b", coef="T1_chronAge"),
set_prior("normal(100,0.5)", class="b", coef="Sexmale"))

just so that you can visualize and confirm.

4 Likes

Thank you very much for your reply!

your prior is inconsistent between the Model1_priors and Model1_posteriors

Thanks for noticing, I’ve fixed it. Both models now run as:

Model1_priors <- brm(data=df2,
              family = bernoulli("logit"),
              ALSvsHC~1+T1_PAD+CognitiveReserve+T1_chronAge+Sex,
              prior=c(set_prior("cauchy(-1,1)", class="b", coef="CognitiveReserve"),
                      set_prior("normal(1,2)", class="b", coef="T1_PAD"),
                      set_prior("normal(0,1)", class="b", coef="T1_chronAge"),
                      set_prior("normal(1,0.5)", class="b", coef="Sexmale")),
              sample_prior = "only", #sampling the prior to compare prior to posterior distribution
              iter=10000,
              chains=4,
              save_pars = save_pars(all = TRUE)) #if we wish to calculate the BF, we must save all parameters

Model1_posteriors <- brm(data=df2,
              family = bernoulli("logit"),
              ALSvsHC~1+T1_PAD+CognitiveReserve+T1_chronAge+Sex,
              prior=c(set_prior("cauchy(-1,1)", class="b", coef="CognitiveReserve"),
               set_prior("normal(1,2)", class="b", coef="T1_PAD"),
               set_prior("normal(0,1)", class="b", coef="T1_chronAge"),
               set_prior("normal(1,0.5)", class="b", coef="Sexmale")),
              iter=10000,
              chains=4,
              save_pars = save_pars(all = TRUE))

Next, I addressed this suggestion:

Check the estimates for Model1_priors

using this piece of code:

cmp <- list(
    "Prior" = avg_comparisons(Model1_priors),
    "Posterior" = avg_comparisons(Model1_posteriors))

draws <- lapply(names(cmp), \(x) transform(posteriordraws(cmp[[x]]), Label = x))
draws <- do.call("rbind", draws)

modelsummary(cmp,
       output = "markdown",
       statistic = "conf.int",
       fmt = 3,
       gof_map = NA)

to receive a table comparing the estimates of both Model1_priors and Model1_posteriors, as suggested here. The table comes out like this, suggesting that the priors are not where they are meant to be. get_prior() and validate_prior() for these models both report the priors to be as specified, but I’m unsure as to what may be happening here?

Prior Posterior
CognitiveReserve -0.072 -0.075
[-0.396, 0.389] [-0.159, 0.010]
Sex 0.069 0.022
[0.000, 0.245] [-0.099, 0.138]
T1_PAD 0.053 0.030
[-0.251, 0.335] [-0.053, 0.100]
T1_chronAge 0.000 -0.021
[-0.180, 0.183] [-0.106, 0.052]

also you have a random 3 in the prior for the posterior model:

I’m really unsure as to how that got there, it wasn’t in my original code when I checked just now. Perhaps a copy-paste error.

I also picked up on this suggestion:

You can always double check if your priors are as expected, by making them more extreme just for testing, e.g.

Which gave me the attached graphs, the top one is the one with the more “extreme priors” and the bottom one is the one with the priors as specified above.


What could be the reason for the priors not being where I specified them?

I don’t know what the functions avg_comparisons, transform and modelsummary do, these are from another package? Do they apply any transformations on parameters? That might explain things. Can you just give the regular summary and plots from brms?

summary(Model1_priors)
plot(Model1_priors, ask = FALSE)
1 Like

Thanks for the superfast reply!

I’ve attached the requested images for you:


well, there you go. From the brms summary you can see that you have sampled from your priors as expected.

So whatever the other package code is doing is responsible, but I am not familiar with what those functions do. In any case, your brms models run just as expected, so look into what the other package is doing to your draws.

1 Like

Thank you very much! I’ll need to look into those functions a little more then. They’re from marginaleffects(). Happy weekend to you!

1 Like