MLE for multimodal Likelihood: frequentist framework

I believe there are some fundamental misconceptions about Bayesian and frequentist modeling at play here.

In frequentist modeling one specifies an observational model, \pi(y; \theta) and introduces estimators, functions from the observational space to the parameter space \hat{\theta}: Y \rightarrow \Theta, and a loss function L(\hat{\theta}, \theta) that quantifies how useful an estimator is if \theta identifies the true data generating process. A frequentist analysis then calibrates the estimator by computing the worst case expected loss. At least a frequentist analysis tries to perform such a calibration; in practice this is often too computationally demanding for nontrivial observational models or estimators or loss functions.

Evaluating the observational model at an observed measurement, \tilde{y}, yields the likelihood function, \pi(\tilde{y}; \theta). The parameter values that maximize the likelihood function define the maximum likelihood estimator. Under very specific conditions the maximum likelihood estimator can be approximately calibrated – unbiased, intervals around the maximum likelihood have nice coverage properties, etc.

One necessary condition for the maximum likelihood to be (approximated) calibrated is that the likelihood function concentrates in a single neighborhood. In other words seeing multiple models indicates that any calibrations in invalid. You can still compute a maximum likelihood, or try to at least, it just won’t have any expected behavior.

In a Bayesian analysis the observational model is complemented with a prior model to give a joint distribution over the data and parameter space. When that joint distribution is conditioned on the observed data we get a posterior distribution. We then quantify inference as expectation values with respect to that posterior distribution.

In general a posterior distribution has no calibration – we have no idea how the posterior distribution or posterior expectation values, will behave a priori unless we do the calibration ourselves.

Multimodality doesn’t prevent us from trying to calibrate our Bayesian model in theory, but in practice it can prevent us from implementing the calibration because we can’t estimate expectation values accurately.

For much more see https://betanalpha.github.io/assets/case_studies/modeling_and_inference.html.

4 Likes