The key difference, if any, between Stan and Deep Learning--argument for paper

All,
I have a paper accepted at Applied AI Letters regarding the relative prevalence of Bayesian modelling (Stan, PyMC3 + interfaces) vs deep learning (PyTorch, TensorFlow, Keras). I use citation counting to compare them and come to the surprising conclusion that Bayesian modelling is about 1/3 of references once computer science is not considered.

Draft is here: Deep Learning does not Replace Bayesian Modeling

The goal for the paper is to give a citation that fund raising can use that establishes that deep learning has not eclipsed all of Bayesian modelling’s domain–contrary to common perception.

I need help with distinguishing the two approaches. The reviewers are hinting around that the difference between Bayesian modelling and deep learning are not that great which I feel that I should address directly. Below is a self contained section on what the key difference is between the approaches.

I welcome feedback, a better argument etc…


One of these things is not like the other: Characterizing the difference between Bayesian modeling and deep learning

What, if anything, are the hard differences between Bayesian modeling and deep learning? Deep learning can be implemented in Bayesian models, and lots of Bayesian concepts get used in the deep learning world as one reviewer noted:

“There is a huge rise in the use of Bayesian Deep learning. Please do not fall into the trap of making DL look like some ‘other’ thing. It’s just a non/hyper-parametric model and as such we can do all the full Bayesian stuff - and many do! There are well-documented works on semi-structured HMC, Riemann MC and good old fashioned MCMC for use in big deep nets, let alone all the neat work on approximate Bayes, mainly using variational learning. Look at work on things like loss-calibrated Bayesian deep nets, Bayes DL, Bayesian autoencoding, ‘HamilTorch’ MCMC package for TensorFlow (hamiltorch: a PyTorch Python package for sampling | Adam Cobb) etc.”

Perhaps Stan and PyMC3 are really part of the same discipline as deep learning and this article catalogs usage differences of siblings from the same parents–sort of like the difference between R and Python?

But no. Another reviewer comment contains the key insight: they said it was no surprise that Bayesian software had high usage outside of computer science since other subjects are “dominated by Stats & Applied Math”. I took this to mean that deep learning is not a natural candidate for use in those fields, and software like Stan and PyMC3 are. Unpacking that a bit I’ll observe:

  • Unless the phenomenon under study is a physical neural net, deep learning only offers prediction services. While outstanding progress has been made and further progress may well involve Bayesian concepts, it is prediction in the end.
  • The basis of prediction in deep learning is opaque to human comprehension even if Bayesian techniques are used. This applies to generative neural nets as well.
  • Opacity blocks use in fields “dominated by Stats & Applied Math” where the goal involves developing and fitting mechanistic models for the most part. The science is in the model description, the quality of fit validates the model. A high quality fit in the absence of an understandable model does not help.

Deep learning clearly is used in mechanistic models but typically as a sensor or classifier, e.g., classify x-ray images for evidence of COVID pneumonia in an epidemialogical study. That study itself will likely be a state-based model where transition rates are explicitly estimated statistically and are human interpretable.

Another case are hybrid models where a deep learning component replaces some or all of the likelihood of a Bayesian model. While the overall model may be interpretable, the individual deep learning components remain black boxes. Interpretability for the overall model comes from have well understood properties of the deep learning components, e.g., how they were trained etc. For example Maggie Lieu’s presentation at StanCon that used a deep learning component to substitute for a numerically unstable galaxy mass estimator().

Also Bayesian models can be utterly opaque if not authored with an eye to understandability, but the option for understandability exists and is generally the expectation. Deep learning systems cannot match this because a big portion of ‘authoring’ (hyper parameter setting) in those frameworks are beyond human understanding.

In the end I believe model interpretability cleanly differentiates the the deep learners and Bayesian modelers[^3], at least in practice.

[^3]:However, in a year this article may be much more difficult to write since deep learning packages have adopted HMC implementations. But I’d suggest that is more of a grafting of another approach to their code base rather than natural growth of deep learning techniques into the PyMC3/Stan space.


Any improvements on the above much appreciated. This could be important for fund raising as a citation so it is worth getting right.

thanks
Breck

2 Likes

I think this is a very interesting topic of discussion. I’ve read a few good (and a few not so good) takes on this but haven’t really come across a satisfactory answer. My (admittedly nonexpert) working impression is that really the only difference is accepting the need for some form of probability theory to deal with uncertainty or irreducible variability. Once one does this then I assume coherent DL/ML learning methods can eventually be understood from a (possibly nonparametric or likelihood free) statistical perspective, even if we haven’t been to formalize exactly how yet.

Of course, it is also entirely possible to not think about probability at all and instead approach it from a functional approximation/minimizing test set error way of thinking which can sometimes be understood from a statistical point of view and sometimes not.