Should I marginalize to the extent possible?

Bonnevie · August 13, 2018, 9:43am

I am working with models where y_i=\mathcal{N}(Ax_i,D) and x_i\sim \mathcal{N}(\mu,\Sigma) is Gaussian, which means that conditional on A I can model y_i| A\sim \mathcal{N}(A\mu,A\Sigma A^{T}+D). In this scenario, y_i often lives in a space much larger than that of x_i and D and \Sigma can generally be assumed to be diagonal.

My question is whether it is recommended to implement this type of model with x_i marginalized out, or with x_i as a latent variable. So we can implement either:

The complete un-marginalized model, including the latent x_i. The advantage seems to be that we are only ever drawing from diagonal Gaussians, but the disadvantage is that we are including a large number of latent variables into the model.
The marginalized model, with x_i integrated out. There are a lot fewer random variables to sample, but they now all have to be sampled from a multivariate Gaussian.

Although the multivariate Gaussians technically have the dimensionality of y_i the covariance matrix has low-rank structure, so we effectively only have to do inverses on matrices with the (smaller) dimensionality of x_i, so we can shave some off the cost, but we still need to do some extra linear algebra.

Even though HMC is efficient compared to its competitors, its sampling ability still scales in the dimensionality of the sample space, so I would venture that marginalization helps with sampling. But it could be argued that the marginalized posterior is more complicated, which might counter the effect.

Knowing the other parameters, x_i could be sampled from the true posterior conditional on each sample either post hoc or in the generated quantities block, so we can retrieve x_i if needed, even if we use model 2.

A further disadvantage of model 1 seems to be the application of PSIS-LOO. If I want to evaluate the predictive probability p(y_i|\mathcal{D}_{obs}) with x_i marginalized, which seems the sensible thing to evaluate, I guess I have to use model 2 or at the very least do the computational work of model 2 on top of model 1. Or is it actually better to apply LOO conditional on samples of x_i since the marginalized density might be needlessly diffuse? I got some really poor \hat{k} when I tried it on the marginalized model.

sakrejda · August 13, 2018, 1:18pm

I think this is a fair description of the trade offs. Marginalization is almost always a win but if you use HMC to sample something MVN with a million dimensions it works fine too. I would be surprised if the marginalized model posterior for the remaining parameters were simpler than the nonmarginalized version so that’s my only concerned. Having a lot of spare MVM parameters around might just hide the problem I the nonmarginal version.

Bob_Carpenter · August 22, 2018, 10:25am

It will work, but it’ll take a lot longer to mix.

In almost all cases, if the posterior’s complicated under marginalization, then it’ll be hard to fit with the marginalized variables included explicitly. It would be nice to get more examples of these tradeoffs, so please feel more than free to share results back here.

Topic		Replies	Views
Why would anyone ever want to use a likelihood for a mixture model in which the discrete variables are "not marginalized out" Modeling specification	11	889	May 27, 2019
HMC sampling from a certain domain by reducing likelihood sigma gradually General	5	524	April 6, 2018
Roule of thumb for data availability to start considering optimize instead of sampling General	2	384	October 15, 2019
Hierarchical Linear Models - Bayes vs. Frequentist General	7	5695	December 14, 2019
WAIC on integrated likelihood Modeling loo	1	423	October 12, 2018

Should I marginalize to the extent possible?

Related Topics