Deep learning and Bayesian modeling citation comparision, surprising to me, comments sought

In my grant writing explorations I find it useful to see how Stan/Bayesian modelling compares with other scientific software. A big one has been deep learning packages. I found what I think are surprising results. Fairly brief write up at:

https://breckbaldwin.github.io/ScientificSoftwareImpactMetrics/DeepLearningAndBayesianSoftware.html

TL;DR We are doing way better than you might think once you throw out the computer science literature. Comments sought…

3 Likes

Not sure if this is the kind of comments sought but I have to say I am a bit surprised by the low percentage of Bayesian in Physics and Astronomy, not only because it’s a field where having models built from first principles with prior information is quite common but also considering how popular packages like emcee are. As of today, emcee : the MCMC hammer has 6039 citations according to google scholar for 3982 of Stan : a probabilistic programming language, and I expect most of those to be astronomy papers.

That’s great. I did a scopus search on ‘emcee’ and got:

image

4,229 with dropoff at 2012 or so. Scopus is more conservative that google scholar.

I’d guess Stan can’t compete with emcee, deep learning is 4,803. Is emcee specialized for astro problems and is it still active? Also is it Bayesian? I assume there is MCMC that doesn’t care about Bayesian stuff. Had a look, seems plenty Bayesian to me.

Verified that the table values were correct for Physics/Astro.

It is a big bump on raw counts. New graph:

New table row:

Bayesian Deep Learning totals RStanArm Keras PyMC RStan PyTorch brms PyStan emcee Stan TensorFlow detail totals
Physics and Astronomy 4329/47% 4803/53% 9132 8/0% 1928/19% 324/3% 23/0% 906/9% 31/0% 31/0% 3826/37% 134/1% 3062/30% 10273

This conversation has me thinking about growth rates now, deep learning looks like is is growing faster–label covered the slope and I didn’t think about it. Log plot below:

The cumulative counts are not good, or start in 2018. Bayesians have the advantage just for being around longer. This is why I didn’t include BUGS/JAGS, that easily adds another 4k.

1 Like

That’s a very interesting analysis and graphs.

Hey @breckbaldwin this paper is interesting ([1806.06850] Polynomial Regression As an Alternative to Neural Nets), it does enforce my intuition that neural nets are really good when you plug in convolutions or a transformer architecture (and of course, if your data is so unstructured that you’ll need these kinds of stuff).

@storopoli, is that paper on the right track? If so I’ll have to start tracking polynomial regression–package name is polyreg: Here you go for CRAN downloads on the Rstudio mirror:

No my apologies. It was a joke. 😅

I was making fun of deep learning and how people are nonsense to compare deep learning frameworks with Bayesian frameworks. It is a different tool for different purposes. It’s Ok to compare {dplyr} versus {data.table}, but {PyTorch} versus {Stan} is not a good comparison.

1 Like

Ok, too slow on my end… ;)

B