Metamodeling

Hello, I’m an environmental engineer rather than a statistician. Does anyone have experience or advice with metamodeling for Bayesian inference? I think there is a broad use case in the engineering profession, though I have not found any examples. A search for “metamodeling” yields many examples for the purpose of computational efficiency, but none for the purpose of Bayesian inference. I’ll briefly explain a hypothetical example and outline some potential solutions.

As a hypothetical example, think of a highly non-linear USGS or Army Corps model that has been developed for a specific purpose, say groundwater modeling. It would be impractical to recode the model in Stan, although the model could be executed repetitively with a variety of input parameters. I’d like to make a deterministic metamodel from the input-output matrix that could then be used to fit real-world data (i.e., perform inference on the output parameters to estimate input parameters).

So the task is to identify a metamodel representation that is both faithful to the original model and explorable by MCMC. A linear regression model would be straight forward to implement, but would not sufficiently pick up non-linearities for my purpose. A latent gaussian process model could work, although I could use some advice on appropriate kernels for the purpose of deterministic modeling with variable data density in multidimensional space. I’ve considered k-nearest neighbors, decision tress and linear interpolations, but I’m assuming that the edges would be difficult for MCMC.

Hopefully this is a general use case that could benefit others. Thoughts and advice are welcome, thanks!

I think this is another name for (or is at least quite similar to) ‘emulation’.

3 Likes

Thanks for the lead. Indeed, the BACCO package has implemented an approach to this problem, uses a GP as the metamodel representation, and provides an example from environmental science.

2 Likes

I have also seen this idea called “surrogate modeling”. A quick google search turned up this (https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2011WR011527) with applications to water resources. Another paper I’ve read recently referenced this (https://epubs.siam.org/doi/10.1137/16M1082469).

1 Like

Another good reference, thanks. Polynomial and neural network metamodels were omitted from my original list. So far, my take-away is that there are lots of ways to metamodel, but a deterministic GP is likely a good choice for inference.

1 Like