Why doesn't sampling fail when optimizing does?

jeffpollock9 · February 12, 2020, 10:57am

Hi,

As far as I understand, finding the posterior mode via optimization usually doesn’t work for hierarchical models since the objective function is unbounded due to the contribution from the prior, e.g.:

# parameters
effects <- c(0.0, 0.0)
effects_sd <- 0.0

# prior lpdf
effects_prior_lpdf <- sum(dnorm(effects, 0.0, effects_sd, log = TRUE))

> print(effects_prior_lpdf)
[1] Inf

so the maximum value of the objective function could be found at infinity via setting the all the hierarchical effects and scale to 0. In this case the solution effectively ignores any contribution to the objective function from the likelihood - which doesn’t sound particularly useful.

What I am struggling to get my head round is, why does sampling work fine? Why doesn’t the sampler eventually propose some value sufficiently close this unbounded mode and get stuck?

Any help would be awesome - thanks!

betanalpha · February 12, 2020, 2:51pm

The posterior mode is a consequence of an arbitrary parameterization and is not “typical” of the posterior distribution itself. Algorithms like Markov chain Monte Carlo try to quantify an entire probability distribution and hence rarely spend any time around the uncharacteristic mode.

For a much more detailed explanation see https://betanalpha.github.io/assets/case_studies/probabilistic_computation.html.

jeffpollock9 · February 13, 2020, 8:28am

Thanks for this reply, @betanalpha. The case study is super helpful!

Bob_Carpenter · February 13, 2020, 9:26pm

A slightly more technical answer is that the sampler will stay in the typical set, which is the set of elements with log density within epsilon of the entropy (expected log density). The typical set usually doesn’t contain the mode in even moderately high dimensions.

I find it useful to work through simple examples by simulation, which I do in this case study: Typical Sets and the Curse of Dimensionality

Topic		Replies	Views
Using output from optimization algorithms to initialize sampler Algorithms optimization , mcmc	6	1253	April 25, 2019
Optimizing() and sampling() over disjoint domain/ excluded point Modeling	16	1326	October 22, 2017
Hierarchical models not having a posterior mode General	2	556	June 13, 2022
RuntimeError: Something went wrong after call_sampler Modeling	7	999	June 3, 2020
Fitting distribution with data on probability mass in bins Modeling techniques , fitting-issues , specification	1	296	November 16, 2023

Why doesn't sampling fail when optimizing does?

Related topics