When it's necessary: doing model cutting right. How?- reference to statmodeling post

Premise


In the real world, start modeling from some data+uncertainty is much better than using just data (uncertainty free). When the inference has been done by third parties (or joint modelling is not possible or really impractical).

I really enjoyed the post https://statmodeling.stat.columbia.edu/2020/01/08/how-to-cut-using-stan-if-you-must/

In particular by Ben

We cut in all models, at some level. No way we have good, probabilistic models all the way down to the sensor level. At some point when we take measurements, we write down numbers — those are cuts.

And by Pierre E Jacob

Your suggested first solution to approximate the cut distribution (“basically, you’d first fit model 1 and get posterior simulations, then approx those simulations by a mixture of multivariate normal or t distributions, then use that as a prior for model 2”) would not approximate the cut distribution, if I understand it correctly. This would approximate the standard posterior distribution, with an error coming from a mixture being an imperfect representation of the first posterior. It would still be standard Bayesian inference since the parameter on which you put a prior gets updated in the second stage.
The point of the “cut” is that some marginal distributions would not get updated. A long-and-dirty way (but conceptually simple) of getting an approximation of the cut, is as follows: perform the ‘second stage’ posterior simulation, conditional on many independent draws from the ‘first stage’ posterior, and finally putt all the draws together.

Question


If the second model is really simple and works perfectly with optimize from rstan, is it an OK solution for run a model for each draw of the previous model (or each data uncertainty combination) and merge the point-estimates to optain a sudo-posterior distribution?

Thanks a lot

2 Likes

I am far from expert on this, but my gut instinct would be that the proposed method (optimize a second model for each posterior sample of a first model) to be usually somewhat better than just fitting something like the second model directly and sometimes possibly better than just fitting something like the first model directly. It IMHO would very likely be inferior to both the approximation by Pierre E Jacob (or the neural net approximation mentioned elsewhere) and certainly inferior to the full model.

I am also quite sure that in unfavorable conditions the method would break terribly (e.g. when the second model is very poorly identified, so a lot of uncertainty would get ignored by running optimize). I would not use unless I completely had to.

I would expect @betanalpha to vehemently object to the proposed method and share some geometrical insight on why the proposed method has problems :-)

2 Likes

A post was split to a new topic: Implementing model cuts in Stan