Is stan useful for MML? What is the state of GMO?

jan-glx · January 19, 2018, 4:05pm

Is stan useful for empirical Bayes/maximum marginal likelihood?
It seems that the gmo package package should provide this functionality. Is it still being developed and will the corresponding write-up be published anytime soon?

In case there is, wouldn’t it be more efficient to perform maximization and sampling in parallel within stan instead of the current sequential implementation? Is something like this on the roadmap?

avehtari · January 22, 2018, 2:02pm

gmo specifically is on hold due to limited resources and priorization, but fast approximative marginalization in general is on the roadmap. Do you have specific modeling problems in mind or are you in general interested in this kind of approaches?

jan-glx · January 22, 2018, 3:06pm

Thanks for the info! I was asking because I would like to perform a hypothesis test (shudder) for one coefficient in a fairly complex hierarchical model.

arya · January 22, 2018, 4:03pm

What exactly is max marginal likelihood? Is that just where for funnel-like problems you take the maximum of the marginal density of the variance term instead of taking the maximum of the joint density? I know the joint maximum at the funnel is all the way at the bottom of the neck, which is why maximum likelihood doesn’t give sensible estimates for hierarchical models.

anon75146577 · January 22, 2018, 5:09pm

You maximize the marginal likelihood :p

Specifically for the case where you have a multilevel model with a big vector of random effects x and a vector of parameters theta, you marginalise out x and find an estimate for theta by maximizing the marginal posterior p(theta | y).

You then treat that as fixed and compute the posterior p(x | y ,theta) and either use that for inference or compute its maximum.

Stephen_Martin · January 23, 2018, 7:03am

From what I understand, it’s meant to give saner estimates for when the likelihood surface has several local maxima in particular. There may be many solutions with some local maxima, when several parameters are unknown. It seems to me that MML approaches are used to avoid the problem of joint ML when multiple maxima exist. The gist is to maximize across some set of dimensions, then fix those, and maximize across the other dimensions, treat those as fixed, rinse and repeat. Pretty literally winds up looking a lot like a block gibbs sampler, where the goal is maximizing rather than sampling.

Anyway, you can imagine a really bumpy 2D density where x is one param, y is another, and z is the likelihood (or posterior). Fix x, maximize in y. Fix y, maximize in x. Fix x, maximize in y, etc. The end result is to find the maximum at the MARGINAL distributions, rather than the joint maximum. It finds the “overall” maximum for each parameter value, across all the small local maxima, across all other parameters’ values.

That’s how I understand it, anyway. It’s necessary for complicated models, including anything with latent variables, random effects, mixtures, etc. Joint maximum likelihood will just get stuck; MML won’t as easily.

Edit: This quick answer depicts it well: https://stats.stackexchange.com/a/133299

Bob_Carpenter · February 6, 2018, 2:30am

In something like lme4, it’s used because the maximum likelihood estimate doesn’t exist (the likelihood is unbounded).

It’s just one step. You start with p(alpha, phi), marginalize out to get p(phi) where usually phi are the hierarchical parameters and alpha the lower level, then find the MML estimate phi* = argmax p(phi), then plug that back in to get p(alpha, phi*). This is the “empirical Bayes” bit. At this point, you can either sample alpha or optimize it given a fixed phi*. At this point, we lose the uncertainty on phi by fixing it, but you can now optimize alpha and then you can lay down a Laplace approximation to underestimate uncertainty to some unknown degree.

Topic		Replies	Views
Tutorial on Monte Carlo EM and variants for MML and MMAP Algorithms	16	3722	October 22, 2018
What should API look like for a marginal optimization algorithm? Developers	21	1528	March 10, 2022
Maximum Likelihood estimation for Random Effect model for Meta Analysis Modeling specification , hierarchical-model	25	381	December 4, 2024
Stan for Bayesian Hierarchical Models Publicity	6	1407	July 20, 2018
Paper: Causal inference with panel data by Pang, Liu, and Xu Modeling techniques	15	2092	July 18, 2023

Is stan useful for MML? What is the state of GMO?

Related topics