In the framework of industrial applications, I have a process where tools T are dedicated to execute a repetitive task, and when the tool is worn out it is replaced by a new tool.
The evolution of a tool can be described by a model that output the state of the tool y (almost never directly measured) given available information x, that is,
where \theta_T are the parameters of the tool. Unfortunately, to fit the model much information is missing, so I include it as parameters of the model, which considerably increases the dimension of the sampling space, but Stan still samples somehow fast.
To forecast the performance of a tool T using the model, I use the posterior P(\theta \mid D), where D means data. The model is hierarchical, and the parameters of the tool \theta_T are sampled from another distribution that depends on the parameters \bar{\theta}. In the ideal case, \theta_T should be independent of the tool, but some variability has been observed, and it is experimentally unfeasible to understand that variation, so my plan is to use data from the factory.
To successfully deploy the model, I would like to update my posterior with the data collected after each tool is discarded, assuming the tools are drawn independently, so
The problem is that the input of Stan are smooth functions and the output are samples, so I have to convert samples into a smooth probability distribution.
A solution I have heard is to use some density, like a Gaussian, and adjust, but I do not think this hack would do the work. Another solution is to fit the whole dataset; unfeasible. I found two other threads, this and this, but I feel they do not totally address my problem.
I came across a paper on image processing where the prior density function is learned using samples (great!). One of the methods is Normalizing Flow (NF), which I realized belongs to the family of methods (distribution learning) containing GANs and VAEs, but NFs seem better to me because it is easy to compute the probability density, which Stan uses.
I still have many doubts (excuse me if I ask nonsense), but I feel sort of lost.
- How much is it known in the Bayesian community about the use of normalizing flows for online learning (in my case, samples (tools) are independent, no sequential learning)?
- How much is it known about the use of distribution learning in general (not only NFs) to address the problem of posterior re-use?
- Is Variational Inference better for me instead of MCMC? Is the output of Variational Inference a smooth function I can reuse? I have heard that VI is not robust, and I fear some degradation of information at each update.
- I have seen that Pyro, an alternative to Stan, has some modules for NFs. It hints at the fact that the “my” problem and a solution is already well known and studied within the community. What do people know?