Decomposing models, fitting a simpler one and using the fitted simple model to fit the mode complex one

Hi,

I was wondering if the following type of “modus operandi” is OK/common in the Stan / Bayesian community :

Let’s say I want to fit a HMM, but instead I just simply fit a GMM, extract the parameters.

Then using those extracted parameters I do the corresponding HMM but keep the already determined GMM parameters “fixed” => this way I can reduce the computation complexity.

But this this even make sense ? If I remember correctly the author of ScalaStan perhaps was hinting at something similar. Not sure though.

Cheers,

Jozsef

People do this and generally refer to it as a multi-stage procedure or something similar. One problem is that you don’t correctly account for uncertainty in stage 1 when doing stage 2 (or in correlation among stage 1 and stage 2 parameters but when/whether that’s an issue depends on the problem (and sometimes stage 1 has much more data so the uncertainties are tiny compared to stage 2). It’s a great way to start looking at a multi-component problem because you can test out the simpler models before putting them together. Hope that helps.

Thanks for the comment @sakrejda . It might be also the case that the composite model is computationally intractable … :( as the complexity grows with power 3 in the number of parameters, you double the nr of parameters you need 8 times more CPU time, and AFAIK Stan ATM is not really parallel.

Nevertheless, putting together stuff from components is a pretty nice idea ! This is why I like the ScalaStan interface. It makes model building compositional, as in, function composition.

Cheers,

Jozsef

Stan is parallel and it even makes it useful!

Hmm, interesting. Now Greta started to use Tensorflow. I wonder what is parallel where ?

In Stan I can run chains on separate cores. Which is decently parallel I assume, or not ?

So what can Greta get in addition to running chains on different cores from Tensorflow ? I wonder :)

In Stan parallelism is within chain, I assume in Greta too. Last I checked Stan ran a larger selection of models because of the functions it can autodidff through. An up to date comparison would be great if you are interested in working out the various capabilities!

Well, I don’t really know, but I think with TF it can do more than running different chains on separated CPUs.

Also, this looks kind interesting : https://www.tensorflow.org/probability/api_docs/python/tfp/edward2/Mixture

However, Edward community seems to be pretty dead :(