A have a simple mock dataset and a logistic model where two predictor categories exhibit quasi-complete separation. What I want to do is visualize how the resulting infinite MLE is “tamed” by the prior. Below is a minimal example. Even though I use brms, choice of interface should be irrelevant here.
n <- c(96, 48, 24, 12, 6, 3)
x <- factor(LETTERS[1:length(n)])
0.75 -> p
y <- n*c(0.5, p, 1-p, p, 0, 1)
d <- data.frame(y = unlist(sapply(1:length(y), function(i) c(rep(1, y[i]), rep(0, n[i]-y[i])))),
x = rep(x, times = n))
Bayfit <- brm(y ~ x, family = bernoulli, data = d, prior = prior(normal(0,2.5), class = Intercept) + prior(normal(0,2.5), class = b), seed = 3, warmup = 1000, iter = 3500)
Post <- posterior_samples(Bayfit)
With the model thus fit and the posterior samples stored, now we visualize the infinite MLE, the prior and the posterior:
curve(dbinom(0, 6, plogis(x)), -10, 10, lwd = 2, col = "blue", xlab = expression(beta[4]), ylab = "P")
curve(dnorm(x, 0, 2.5), add = TRUE, col = "red", lwd = 2)
lines(density(Post$b_xE), lwd = 2)
legend(4.5, 0.9, legend = c("Prior","Likelihood","Posterior"), col = c("red","blue","black"), lwd = 2, cex = 0.8)
While I’m confident that this illustration is “directionally correct”, I have an uneasy feeling about the blue (likelihood) graph. AFAIK the likelihood, unlike the prior and the posterior, is NOT a PDF and need not integrate to 1. This seems clear from the graph, where the likelihood has by far the largest area under the curve. How big of a problem is this? Is there some simple constant that could or should be applied to the likelihood in order to make it more commensurate to the prior and the posterior? I think the present graph works alright for the purpose of illustrating the action of the prior on the posterior, but it would still suck if it contained some glaring mathematical idiocy. Does it?