Should I marginalize to the extent possible?

Bonnevie · August 13, 2018, 9:43am

I am working with models where y_i=\mathcal{N}(Ax_i,D) and x_i\sim \mathcal{N}(\mu,\Sigma) is Gaussian, which means that conditional on A I can model y_i| A\sim \mathcal{N}(A\mu,A\Sigma A^{T}+D). In this scenario, y_i often lives in a space much larger than that of x_i and D and \Sigma can generally be assumed to be diagonal.

My question is whether it is recommended to implement this type of model with x_i marginalized out, or with x_i as a latent variable. So we can implement either:

The complete un-marginalized model, including the latent x_i. The advantage seems to be that we are only ever drawing from diagonal Gaussians, but the disadvantage is that we are including a large number of latent variables into the model.
The marginalized model, with x_i integrated out. There are a lot fewer random variables to sample, but they now all have to be sampled from a multivariate Gaussian.

Although the multivariate Gaussians technically have the dimensionality of y_i the covariance matrix has low-rank structure, so we effectively only have to do inverses on matrices with the (smaller) dimensionality of x_i, so we can shave some off the cost, but we still need to do some extra linear algebra.

Even though HMC is efficient compared to its competitors, its sampling ability still scales in the dimensionality of the sample space, so I would venture that marginalization helps with sampling. But it could be argued that the marginalized posterior is more complicated, which might counter the effect.

Knowing the other parameters, x_i could be sampled from the true posterior conditional on each sample either post hoc or in the generated quantities block, so we can retrieve x_i if needed, even if we use model 2.

A further disadvantage of model 1 seems to be the application of PSIS-LOO. If I want to evaluate the predictive probability p(y_i|\mathcal{D}_{obs}) with x_i marginalized, which seems the sensible thing to evaluate, I guess I have to use model 2 or at the very least do the computational work of model 2 on top of model 1. Or is it actually better to apply LOO conditional on samples of x_i since the marginalized density might be needlessly diffuse? I got some really poor \hat{k} when I tried it on the marginalized model.

sakrejda · August 13, 2018, 1:18pm

I think this is a fair description of the trade offs. Marginalization is almost always a win but if you use HMC to sample something MVN with a million dimensions it works fine too. I would be surprised if the marginalized model posterior for the remaining parameters were simpler than the nonmarginalized version so that’s my only concerned. Having a lot of spare MVM parameters around might just hide the problem I the nonmarginal version.

Bob_Carpenter · August 22, 2018, 10:25am

It will work, but it’ll take a lot longer to mix.

In almost all cases, if the posterior’s complicated under marginalization, then it’ll be hard to fit with the marginalized variables included explicitly. It would be nice to get more examples of these tradeoffs, so please feel more than free to share results back here.

Topic		Replies	Views
Marginal predictions and cross-validation with latent variable model Modeling loo , ecology , marginal-likelihood , cross-validation	17	2244	April 27, 2022
Marginalisation and efficiency Modeling performance	9	729	November 13, 2019
Particle marginal Metropolis-Hastings in Stan? Modeling	5	417	February 10, 2025
Alternatives to marginalization Modeling techniques , specification	7	1366	November 5, 2017
Leave-group-out LOO question General loo	4	603	June 16, 2020

Should I marginalize to the extent possible?

Related topics