Non-parametric Bayesian updating

robertgrant · November 24, 2021, 1:45pm

I have been working on ways to do non-parametric Bayesian updating, which is to say using a previous posterior sample as a prior sample when new data arrive. There is a paper in press in a special issue of Intl J Computational Economics and Econometrics, but as it will be embargoed, I made a web page here: Robert Grant - stats There are several R scripts and Stan models that you are welcome to adapt. They are all CC-BY.

In essence, we can use density estimation trees as a scalable approach for high-dimensional samples. But we need smooth estimates for Stan (or MALA or PDMPs etc etc), so we replace the edges (or convolve them, if you like) with an inverse logistic function:

g(x) = \frac{1}{1+e^{-x}}

There is a bandwidth tuning parameter to think about and a translation of the midpoint towards the nearest mode, to control variance inflation. There’s plenty more work to be done on this (listed in the webpage), and y’all are welcome to take bits on. I’m a freelancer so I don’t have much time for methodology.

@Funko_Unko this might address your question which arose some way down in this topic: Fitting ODE models: best/efficient practices? - #65 by Funko_Unko

I will add some more content on ensembles of kudzu sometime soon, in January probably. They provide a big improvement.

spinkney · November 24, 2021, 2:03pm

Very cool! I don’t fully grok the workflow. Let’s say I have a model where the day is hard to fit into memory. Maybe it has a time component and I can fit 1 day in bit 1 week is hard and 1 month is prohibitive. Could I run the analysis on 1 day, use this posterior using the kudzu density, and then run on day 2, etc.? That way I could get a posterior for a month without having all the data in memory?

I guess it’s a bit more noisy than running the full data but that is often acceptable for me.

robertgrant · November 24, 2021, 2:18pm

That’s right. I think this only becomes a useful option when you have large and streaming data. If you just accumulate it, it will soon be so large that you don’t need to update at all! So the use case is large, streaming (I use term loosely, I just mean steadily arriving, I don’t only mean a Flink-type data pipeline) and evolving. That suggests that a likelihood using a window of time might be desirable, maybe exponentially-weighted or just a sliding window. It might get very noisy, but its usually possible to do some kind of periodic all-data analysis to reset the p[oste]rior.

robertgrant · November 24, 2021, 2:19pm

The target+= syntax really makes this easy (both in terms of typing and thinking) to implement in Stan.

robertgrant · November 24, 2021, 5:12pm

@spinkney I added a flowchart at Robert Grant - stats which might help clarify (notwithstanding my handwriting)

Topic		Replies	Views
Scaling Bayesian updating (old posterior to new prior) Modeling fitting-issues	3	196	October 30, 2024
Stan for Bayesian Hierarchical Models Publicity	6	1407	July 20, 2018
Updating Posteriors in Light of More Data Algorithms	4	757	August 30, 2023
Bayesian nonparametric modeling Modeling	16	9073	January 18, 2018
Constraining Change in Estimates Over Time Modeling specification	12	575	December 1, 2018

Non-parametric Bayesian updating

Related topics