Priors for a novice

torkar · October 24, 2019, 10:57am

Please read @paul.buerkner’s vignette here. In particular the first paragraph after the section Setting Prior Distributions explains the Dirichlet prior well IMHO.

Pia · October 24, 2019, 12:12pm

Thank you. I did read the vignette but am unsure whether I understood correctly. That’s why I was hoping someone could check my understanding.

torkar · October 24, 2019, 1:34pm

Pia,

I think @richard_mcelreath says something like this in the new edition of the book:

When we have the same value for each response, i.e., for a 5-point Likert {2,2,2,2,2} or {3,3,3,3,3}, this is a Uniform prior. BUT the larger the values (i.e., {3,3,3,3,3} in this case), the more prior knowledge we have that the probabilities are all the same.

You can try it out and simulate it, i.e., just change rep(5,5) to rep(1,5) or rep(3,5):

library(gtools)
delta ← rdirichlet(10, alpha=rep(5,5))
plot( NULL , xlim=c(1,5) , ylim=c(0,0.4))
for (i in 1:nrow(delta))
lines( 1:5 , delta[i,] , type=“b”)

Pia · October 24, 2019, 1:48pm

Oh I see! Thank you, that is very helpful.

Lastly, any ideas regarding the regression coefficients for a cumulative probit model?

torkar · October 24, 2019, 1:59pm

Sorry, but I’ve always plotted my priors to see what they imply in reality. I have a hard time picturing this in my head. Perhaps @Max_Mantei can answer this question, i.e., given this model

as.ordered(y) ~ 1 + x + mo(z) with family = cumulative(link = “probit”)

can the

regression coefficients can be interpreted as the difference in z score associated with each one-unit difference in the predictor variable. So say the coefficient for x is .15, this would mean that a one unit increase in x is associated with an increase of .15 standard deviations in y? Both my continuous and my monotonic predictor are on a scale from 0 to 4. Would it thus make sense to use the same prior or is there a difference between priors for continuous and monotonic effects? And does a prior of N(0,1) sound reasonable similar to when using a standardised outcome?

Guido_Biele · October 25, 2019, 8:44am

You can also try something like this in R:

install.packages("Ternary")
TernaryPlot(grid.lines = NA)
par(mfrow = c(1,3), mar = c(0,0,0,0))
for (k in c(.1,1,10)) {
  TernaryPlot(grid.lines = NA)
  TernaryPoints(rdirichlet(5000,c(1,1,1)*k), pch = 16, col = adjustcolor("black", alpha = .2))
  title(paste(rep(k,3), collapse = ", "))
}

Max_Mantei · October 25, 2019, 10:18am

Maybe Michael Betancourt’s case study on Ordinal Regression can give a bit more intuition about what’s going on here (it’s not discussing regression coefficients, though).

So, I think the the description you quoted makes sense. The linear predictor determines where you are on the “z-score” line (being a bit sloppy here with terms; as an Economist I like to think of it as a latent utility); with centered predictors you’re at 0 in expectation. In case of a probit link the z-score is then located at the value of the linear predictor with (fixed) sd of 1. The cut points then tell you what outcome (or choice) the “z-score” (utility) implies. This goes by calculating the proportion of the normal distribution that falls into the “buckets” prescribed by the cut points…

Maybe a little toy example…

pp <- function(z, cuts) {
  p <- c("P(K=1)" = pnorm(cuts[1], z, 1), 
         "P(K=2)" = pnorm(cuts[2], z, 1) - pnorm(cuts[1], z, 1), 
         "P(K=3)" = 1 - pnorm(cuts[2], z, 1))
  return(p)
}

cuts <- c(-0.4, 0.4) # some (fixed) cut points

Now you can probe different values and the implied probabilities for the categories (given the cut points).

> pp(-0.5, cuts)
   P(K=1)    P(K=2)    P(K=3) 
0.5398278 0.2761120 0.1840601 
> pp(1, cuts)
    P(K=1)     P(K=2)     P(K=3) 
0.08075666 0.19349646 0.72574688 
...

Now probing if the regression coefficient is 0.15 and possible x values range from 0 to 4.

prior_beta <- 0.15

for(x in 0:4) cat("X =", x, "\n", pp(prior_beta*x, cuts), "\n")

…which results in

X = 0 
 0.3445783 0.3108435 0.3445783 
X = 1 
 0.2911597 0.3075466 0.4012937 
X = 2 
 0.2419637 0.2978642 0.4601722 
X = 3 
 0.1976625 0.2823987 0.5199388 
X = 4 
 0.1586553 0.262085 0.5792597

Trying a way to big regression coefficient…

prior_beta <- 2

for(x in 0:4) cat("X =", x, "\n", pp(prior_beta*x, cuts), "\n")

results in:

X = 0 
 0.3445783 0.3108435 0.3445783 
X = 1 
 0.008197536 0.04660176 0.9452007 
X = 2 
 5.412544e-06 0.000153696 0.9998409 
X = 3 
 7.768848e-11 1.06399e-08 1 
X = 4 
 2.232393e-17 1.478421e-14 1

This can give you a sense of scaling for the regression coefficients (you can also play around with the cut points).

This approach is really ad-hoc, but can give a sense about the how the regression coefficients translate to probabilities. I think in brms you can do proper prior predictive checks and then plot the results. I guess a N(0,1) prior on the regression coefficients in this case is weakly informative, but you could probably even make it tighter.

Pia · October 25, 2019, 2:02pm

Thank you very much for this detailed explanation!

Topic		Replies	Views
Two basic but important questions for a Bayesian newcomer brms specification	4	669	May 11, 2021
Brms non-linear models: Informative priors General fitting-issues	1	356	April 9, 2020
Setting up priors - the practical side - and some clarifications brms	4	817	April 19, 2020
Default priors for logistic regression coefficients in brms brms	3	2835	May 7, 2020
Novice with no prior experience Modeling specification , priors , brms	9	778	April 1, 2022

Priors for a novice

Related topics