Priors for a novice

Please read @paul.buerkner’s vignette here. In particular the first paragraph after the section Setting Prior Distributions explains the Dirichlet prior well IMHO.

2 Likes

Thank you. I did read the vignette but am unsure whether I understood correctly. That’s why I was hoping someone could check my understanding.

Pia,

I think @richard_mcelreath says something like this in the new edition of the book:

When we have the same value for each response, i.e., for a 5-point Likert {2,2,2,2,2} or {3,3,3,3,3}, this is a Uniform prior. BUT the larger the values (i.e., {3,3,3,3,3} in this case), the more prior knowledge we have that the probabilities are all the same.

You can try it out and simulate it, i.e., just change rep(5,5) to rep(1,5) or rep(3,5):

library(gtools)
delta ← rdirichlet(10, alpha=rep(5,5))
plot( NULL , xlim=c(1,5) , ylim=c(0,0.4))
for (i in 1:nrow(delta))
lines( 1:5 , delta[i,] , type=“b”)

1 Like

Oh I see! Thank you, that is very helpful.

Lastly, any ideas regarding the regression coefficients for a cumulative probit model?

Sorry, but I’ve always plotted my priors to see what they imply in reality. I have a hard time picturing this in my head. Perhaps @Max_Mantei can answer this question, i.e., given this model

as.ordered(y) ~ 1 + x + mo(z) with family = cumulative(link = “probit”)

can the

regression coefficients can be interpreted as the difference in z score associated with each one-unit difference in the predictor variable. So say the coefficient for x is .15, this would mean that a one unit increase in x is associated with an increase of .15 standard deviations in y? Both my continuous and my monotonic predictor are on a scale from 0 to 4. Would it thus make sense to use the same prior or is there a difference between priors for continuous and monotonic effects? And does a prior of N(0,1) sound reasonable similar to when using a standardised outcome?

1 Like

You can also try something like this in R:

install.packages("Ternary")
TernaryPlot(grid.lines = NA)
par(mfrow = c(1,3), mar = c(0,0,0,0))
for (k in c(.1,1,10)) {
  TernaryPlot(grid.lines = NA)
  TernaryPoints(rdirichlet(5000,c(1,1,1)*k), pch = 16, col = adjustcolor("black", alpha = .2))
  title(paste(rep(k,3), collapse = ", "))
}
1 Like

Maybe Michael Betancourt’s case study on Ordinal Regression can give a bit more intuition about what’s going on here (it’s not discussing regression coefficients, though).

So, I think the the description you quoted makes sense. The linear predictor determines where you are on the “z-score” line (being a bit sloppy here with terms; as an Economist I like to think of it as a latent utility); with centered predictors you’re at 0 in expectation. In case of a probit link the z-score is then located at the value of the linear predictor with (fixed) sd of 1. The cut points then tell you what outcome (or choice) the “z-score” (utility) implies. This goes by calculating the proportion of the normal distribution that falls into the “buckets” prescribed by the cut points…

Maybe a little toy example…

pp <- function(z, cuts) {
  p <- c("P(K=1)" = pnorm(cuts[1], z, 1), 
         "P(K=2)" = pnorm(cuts[2], z, 1) - pnorm(cuts[1], z, 1), 
         "P(K=3)" = 1 - pnorm(cuts[2], z, 1))
  return(p)
}

cuts <- c(-0.4, 0.4) # some (fixed) cut points

Now you can probe different values and the implied probabilities for the categories (given the cut points).

> pp(-0.5, cuts)
   P(K=1)    P(K=2)    P(K=3) 
0.5398278 0.2761120 0.1840601 
> pp(1, cuts)
    P(K=1)     P(K=2)     P(K=3) 
0.08075666 0.19349646 0.72574688 
...

Now probing if the regression coefficient is 0.15 and possible x values range from 0 to 4.

prior_beta <- 0.15

for(x in 0:4) cat("X =", x, "\n", pp(prior_beta*x, cuts), "\n")

…which results in

X = 0 
 0.3445783 0.3108435 0.3445783 
X = 1 
 0.2911597 0.3075466 0.4012937 
X = 2 
 0.2419637 0.2978642 0.4601722 
X = 3 
 0.1976625 0.2823987 0.5199388 
X = 4 
 0.1586553 0.262085 0.5792597 

Trying a way to big regression coefficient…

prior_beta <- 2

for(x in 0:4) cat("X =", x, "\n", pp(prior_beta*x, cuts), "\n")

results in:

X = 0 
 0.3445783 0.3108435 0.3445783 
X = 1 
 0.008197536 0.04660176 0.9452007 
X = 2 
 5.412544e-06 0.000153696 0.9998409 
X = 3 
 7.768848e-11 1.06399e-08 1 
X = 4 
 2.232393e-17 1.478421e-14 1 

This can give you a sense of scaling for the regression coefficients (you can also play around with the cut points).

This approach is really ad-hoc, but can give a sense about the how the regression coefficients translate to probabilities. I think in brms you can do proper prior predictive checks and then plot the results. I guess a N(0,1) prior on the regression coefficients in this case is weakly informative, but you could probably even make it tighter.

2 Likes

Thank you very much for this detailed explanation!