Hi, I’m benchmarking a few high-dimensional posteriors with potential multimodality and would like to gather recommendations on inference methods to try.
So far, I’ve tested:
SMC with HMC kernels: Good at capturing multimodality but struggles as dimensionality increases.
SVGD: Performs reasonably but scales less well with higher dimensions.
Multi-path Pathfinder: Fast and works decently, especially when paired with Pareto-smoothed importance sampling to refine runs, though I’m uncertain about its reliability given its limited variational family (multivariate normal with low-rank plus diagonal covariance).
Normalizing Flow (Block Neural Autoregressive Flow): Great for multimodality but struggles with scalability in high dimensions.
I know there is no single ‘right’ answer, and there are many trade-offs to consider, such as speed, accuracy, and computational cost. What inference methods would you recommend for handling high-dimensional, potentially multimodal posteriors?
First, there’s the combinatorial multimodality you get with clustering models like high-dimensional (multivariate) normal mixtures (the model form of K-means clustering) and even worse, mixed membership models like Latent Dirichlet Allocation. In the general form of these problems, maximum likelihood inference is NP-hard, so nothing’s going to work reliably in the worst case. Neural networks are similar. In these problems, there’s no way you’ll ever be able to calculate posterior integrals, so everything’s approximate and you can use approaches like stacking (for example, see the papers by @yuling and @andrewgelman). You can also just use something like SGD in a form like Adam, control step sizes carefully, and hope it gets somewhere useful—this is the standard operating procedure for neural networks.
Second, there’s the kind of limited multimodality you get with things like molecular dynamics models where a biomolecule might have a handful of stable energy configurations expressed in terms of “collective variables” (dimensionality reduction to, for example, a few angles that are free in the molecule). In these cases, alternatives like SMC, parallel tempering, etc., stand a chance of working. I don’t try to fit these problems usually, so I don’t know what software’s available for it. The most promising approaches these days seem to be realNVP normalizing flows and conditional diffusion models. Both are going to require a decent GPU to get off the ground.
Pathfinder might work in the second situation with enough paths to find all the modes (or clever initialization to do the same). The importance weighting will get the right answer. Another alternative to this is in the work of Marylou Gabriel and Erik Vanden-Einden on Langevin to estimate modes, normalizing flows to capture all the modes, followed by Metropolis to weight. This can work in the second setting if there aren’t too many modes and the probability masses of the modes of interest aren’t too extreme.