Deep learning and Bayesian modeling citation comparision, surprising to me, comments sought

breckbaldwin · May 23, 2021, 6:04pm

In my grant writing explorations I find it useful to see how Stan/Bayesian modelling compares with other scientific software. A big one has been deep learning packages. I found what I think are surprising results. Fairly brief write up at:

https://breckbaldwin.github.io/ScientificSoftwareImpactMetrics/DeepLearningAndBayesianSoftware.html

TL;DR We are doing way better than you might think once you throw out the computer science literature. Comments sought…

OriolAbril · May 23, 2021, 10:47pm

Not sure if this is the kind of comments sought but I have to say I am a bit surprised by the low percentage of Bayesian in Physics and Astronomy, not only because it’s a field where having models built from first principles with prior information is quite common but also considering how popular packages like emcee are. As of today, emcee : the MCMC hammer has 6039 citations according to google scholar for 3982 of Stan : a probabilistic programming language, and I expect most of those to be astronomy papers.

breckbaldwin · May 23, 2021, 10:57pm

That’s great. I did a scopus search on ‘emcee’ and got:

4,229 with dropoff at 2012 or so. Scopus is more conservative that google scholar.

I’d guess Stan can’t compete with emcee, deep learning is 4,803. Is emcee specialized for astro problems and is it still active? ~~Also is it Bayesian? I assume there is MCMC that doesn’t care about Bayesian stuff.~~ Had a look, seems plenty Bayesian to me.

Verified that the table values were correct for Physics/Astro.

It is a big bump on raw counts. New graph:

New table row:

	Bayesian	Deep Learning	totals	RStanArm	Keras	PyMC	RStan	PyTorch	brms	PyStan	emcee	Stan	TensorFlow	detail totals
Physics and Astronomy	4329/47%	4803/53%	9132	8/0%	1928/19%	324/3%	23/0%	906/9%	31/0%	31/0%	3826/37%	134/1%	3062/30%	10273

This conversation has me thinking about growth rates now, deep learning looks like is is growing faster–label covered the slope and I didn’t think about it. Log plot below:

The cumulative counts are not good, or start in 2018. Bayesians have the advantage just for being around longer. This is why I didn’t include BUGS/JAGS, that easily adds another 4k.

storopoli · May 24, 2021, 9:14am

That’s a very interesting analysis and graphs.

Hey @breckbaldwin this paper is interesting ([1806.06850] Polynomial Regression As an Alternative to Neural Nets), it does enforce my intuition that neural nets are really good when you plug in convolutions or a transformer architecture (and of course, if your data is so unstructured that you’ll need these kinds of stuff).

breckbaldwin · May 26, 2021, 6:38pm

@storopoli, is that paper on the right track? If so I’ll have to start tracking polynomial regression–package name is polyreg: Here you go for CRAN downloads on the Rstudio mirror:

storopoli · May 26, 2021, 6:41pm

No my apologies. It was a joke. 😅

I was making fun of deep learning and how people are nonsense to compare deep learning frameworks with Bayesian frameworks. It is a different tool for different purposes. It’s Ok to compare {dplyr} versus {data.table}, but {PyTorch} versus {Stan} is not a good comparison.

breckbaldwin · May 26, 2021, 7:01pm

Ok, too slow on my end… ;)

B

Topic		Replies	Views
The key difference, if any, between Stan and Deep Learning--argument for paper General	1	1156	October 25, 2021
Article on relative research impact of Bayesian modeling (Stan/PyMC3) vs deep learning (PyTorch, TensorFlow, Keras) Publicity	3	837	July 20, 2022
[NEW BOOK] Bayesian Models for Astrophysical Data Publicity	1	1325	May 31, 2017
Bayesian Benchmarking 1.0 General	6	674	July 20, 2021
Bayesian Class Videos Publicity videos	3	3116	March 1, 2019

Deep learning and Bayesian modeling citation comparision, surprising to me, comments sought

Related topics