Prior initialization closer to declaration

We’re definitely all on the same page as to how the programs get interpreted statistically and computationally.

This post from Michael and talking to Sean today reminded me of my perspective when first laying this out, which were largely derived from a combination of looking at BUGS and MacKay’s deifnition of HMC in his book, which is super clear in terms of what it needs to operate (all the algorithms in that book are very easy to read, which is one of the main reasons I like it so much).

From the HMC perspective, what you need is a log density function to define the potential energy.

BUGS gives you a language that can be translated to log densities. I just interpreted the sampling statements in BUGS as giving you log density increments. I hadn’t really thought at all beyond graphical models at that point.

What I never liked about BUGS models is that it was a guessing game to figure out what the types of variables were (oh, this one shows up on the left of a Poisson, so it must be an integer; oh, there’s an index, so it must be an array). I’ve always believed in strong static typing for languages. I thought we’d get a ton of pushback on this, but the main complaint is that there are three 1D array structures (real[], vector, and row_vector).

But I only wanted the user to have to think about shape. Variables should behave the same way everywhere, just be declared in different places. As Michael just pointed out, that’s very different in some of the other languages where the type of an object depends on its usage (data or parameter) and in BUGS where the language never specifies what the variable is going to be used for—that’s only figured out dynamically when the data is read in.

So what I needed to do was somehow define that log density function. Throughout MCMC, the data remains constant and the parameters vary. The parameters vary, but they’re fed in by the sampler (or optimizer) from the outside. The services the model needed to provide were those of an object storing data as member variables and providing a method to compute the log density.

From that object-oriented perspective (which is exactly how it’s coded in C++, by the way), the data block variable declarations give you the member variable declarations, which are the same as the arguments to the construct. The entire model class is immutable—once you construct it with data, nothing in it changes and it’s stateless.

The variable declarations in the parameters block give you the arguments to the log density function and the model block gives you the body of the log density function. The return value, the log density that gets accumulated, is implicit.

That’s all I really did in the first version and it just ran a multivariate normal with fixed covariance and mean. Then I did some transforms manually and had to learn about Jacobians. At that point, it became pretty clear how to do the transforms to the unconstrained scale to simplify our sampling and optimization algorithms.

Once we had data and parameters, it was easy to define derived quantities that depended only on data (transformed data) or on parameters (transformed parameters). My thinking about those came straight out of Andrew and Jennifer’s variable classification in their regression book. In fact, you can still see some of their terminology floating around the AST and grammars under the hood.

Generated quantities then seemed the natural way to do posterior predictive checks in a way that could be computed once per iteration rather than once per leapfrog step and with double types rather than autodiff types.

That’s really all there is. Everything else is just basic imperative programming language constructs, which sort of fell out when we realized we weren’t using the graphical structure anywhere. We can write whatever we want in the model block as long as it’s equal to the log posterior up to an additive constant that doesn’t depend on parameters. So you can unfold some conjugate priors and code up posteriors directly for more efficiency in some cases. That only occurred to me much much later. I started thinking it was just like BUGS, only it translated the sampling statements to log density increments rather than edges in a graphical model.

5 Likes