Decomposing models, fitting a simpler one and using the fitted simple model to fit the mode complex one


#1

Hi,

I was wondering if the following type of “modus operandi” is OK/common in the Stan / Bayesian community :

Let’s say I want to fit a HMM, but instead I just simply fit a GMM, extract the parameters.

Then using those extracted parameters I do the corresponding HMM but keep the already determined GMM parameters “fixed” => this way I can reduce the computation complexity.

But this this even make sense ? If I remember correctly the author of ScalaStan perhaps was hinting at something similar. Not sure though.

Cheers,

Jozsef


#2

People do this and generally refer to it as a multi-stage procedure or something similar. One problem is that you don’t correctly account for uncertainty in stage 1 when doing stage 2 (or in correlation among stage 1 and stage 2 parameters but when/whether that’s an issue depends on the problem (and sometimes stage 1 has much more data so the uncertainties are tiny compared to stage 2). It’s a great way to start looking at a multi-component problem because you can test out the simpler models before putting them together. Hope that helps.


#3

Thanks for the comment @sakrejda . It might be also the case that the composite model is computationally intractable … :( as the complexity grows with power 3 in the number of parameters, you double the nr of parameters you need 8 times more CPU time, and AFAIK Stan ATM is not really parallel.

Nevertheless, putting together stuff from components is a pretty nice idea ! This is why I like the ScalaStan interface. It makes model building compositional, as in, function composition.

Cheers,

Jozsef


#4

Stan is parallel and it even makes it useful!


#5

Hmm, interesting. Now Greta started to use Tensorflow. I wonder what is parallel where ?

In Stan I can run chains on separate cores. Which is decently parallel I assume, or not ?

So what can Greta get in addition to running chains on different cores from Tensorflow ? I wonder :)


#6

In Stan parallelism is within chain, I assume in Greta too. Last I checked Stan ran a larger selection of models because of the functions it can autodidff through. An up to date comparison would be great if you are interested in working out the various capabilities!


#7

Well, I don’t really know, but I think with TF it can do more than running different chains on separated CPUs.

Also, this looks kind interesting : https://www.tensorflow.org/probability/api_docs/python/tfp/edward2/Mixture

However, Edward community seems to be pretty dead :(