The thing that sounds most interesting to me is the bean machine graph inference, under the umbrella of models which are probabilistic graph representable.

Edit: Also see the discussion of the initial draft of the beanmachine language at Bean Machine.

the umbrella of models which are probabilistic graph representable

What would be the exact scope of this encapsulation? I’ve always thought the sufficient condition for “inference-able” probabilistic programs is that it can be represented as a DAG.

I would definitely welcome more info on this especially from @Bob_Carpenter and @betanalpha, two from that linked thread. You may be exactly right about sufficiency, so Stan can infer all things DAG but the thing with sufficiency is that it sounds like Stan can infer things non-DAG, or not very faithfully represented by a DAG.

I discuss this in my introduction to Stan, An Introduction to Stan. The short answer is that formal directed graphical model representations (with each node corresponding to a single, one-dimensional variable) are not necessary for faithful computation.

Stan programs don’t define a graphical model but rather a target density function over the model configuration and observational spaces. The inference algorithms in Stan run off of evaluates of this density and its gradient and not any internal structure. This means that the scope of the Stan Modeling Language is pretty massive – in fact is subsumes all directed graphical models with real-valued parameter nodes.

The inability to inspect the internal structure of a model does limit Stan’s functionality to methods that use only density and density gradient evaluations, but then again these are the only methods that can be applied to all Stan programs within the full scope. For example when a Stan program is equivalent to a directed graphical model we can’t exploit that structure to generate joint samples over all variables with ancestral sampling.

One last comment – many of the algorithmic tuning problems that are claimed aren’t quite as promising as one might think. The problem is that the structure of a directed graphical model is most useful when analyzing the forward conditional dependence. This can be useful when analyzing the full model (for example when trying to generate joint samples), but once one conditions on any non-root nodes the dependence structure becomes much harder to deduce.

By the way – if one generalizes directed graphical models to allow for multi variable nodes then they can be used to specify any model and hence match the scope of the Stan Modeling Language. This isn’t all that useful for automatically exploiting model structure but it is useful for communicating model assumptions. I use this more general/less formal notion of graphical models in all of my case studies, Product Placement.

Is this effectively just the observation that I can write down any model as a DAG with an appropriate multivariate prior, a single multivariate parameter node, and a single multivariate data node?

Yeah, any density function can always be written as a single node containing all of the component variables. In the Bayesian context we’ll always at least have “data” and “model configuration” nodes which might then decompose into more refined graphics. See for example Section 1.2. of (What's the Probabilistic Story) Modeling Glory?.