I mentioned in the meeting yesterday that I’ve been thinking about how to redesign the Stan language from the ground up to allow some increased flexibility. Here’s my initial thoughts.
New Stan Language
This is a very tentative proposal for a new Stan language. I have a few goals in doing this:
- wanting the model to follow the generative story
- wanting to keep use of variables close to definition
- wanting to support a non-centered prior or zero-inflated Poisson as a function
The syntax is somewhat motivated by PyMC3 and Edward, yet it retains Stan’s underlying commitment to strong static typing, natural probability and matrix syntax, freedom from restriction to graphical models, and modularity of model code from algorithmic concerns. I think I can make it generate a compatible C++ file, so it really is just a new syntax.
Currently, a Stan program is organized by data, parameters and model:
data {
int<lower = 0> N;
vector[N] x;
vector[N] Y;
}
transformed data {
vector[N] x_std = (x - mean(x)) / sd(x);
}
parameters {
real alpha;
real beta;
real<lower = 0> sigma_sq;
}
transformed parameters {
real<lower = 0> sigma = sqrt(sigma_sq);
}
model {
beta ~ normal(0, 1);
alpha ~ normal(0, 2);
sigma_sq ~ inv_gamma(1, 1);
y ~ normal(alpha + x * beta, sigma);
}
generated quantities {
real<lower = 0> tau = inv(sigma_sq);
}
Here’s what I’m thinking it could look like:
param real alpha ~ normal(0, 10);
param real beta ~ normal(0, 10);
param real(0, ) sigma_sq;
real(0, ) sigma = sqrt(sigma_sq); // transformed param
real(0, ) tau = inv(sigma_sq); // generated quantity
data int(0, ) N;
data vector[N] x;
vector[N] x_std = (x - mean(x)) / sd(x); // transformed data
data int(0,1)[N] y ~ normal(alpha + beta * x_std, sigma);
It’s around 10 lines long rather than 25. It satisfies desiderata (1) and (2) above—follows the generative story and variables used closer to where they’re declared.
What it loses is the clear delineation of the types of variables you get in the original code. If you’d seen this one first, I’m not sure if it’d be easier or harder to read. I’m so used to our old notation!
The old notation was motivated by thinking about the variable declarations in the blocks as more or less like variable declarations to a function. For data
, it’s the arguments to the constructor for the C++ — or the y
in p(theta | y)
using Andrew’s notation for theta
parameters and y
data. For parameters
, it’s the argument theta
to log p(theta | y) + const
after y
has been fixed—the log posterior, in other words. The generated quantities
block is used to declare the posterior predictive quantities—that’s really a return type for the generated quantities block. Then, we added transformed data
and transformed parameters
and things got a little muddier, though those were again output types for data and parameter transforms.
Pulling out the compound declaration/sampling statements and reordering, the result looks like current Stan, with clear data, transformed data, parameter, transformed parameter model, and generated quantities blocks:
data int N;
data vector[N] x;
data int(0,1)[N] y;
vector[N] x_std = (x - mean(x)) / sd(x);
param real alpha;
param real beta;
param real(0, ) sigma_sq;
real(0, ) sigma = sqrt(sigma_sq);
alpha ~ normal(0, 10);
beta ~ normal(0, 2);
sigma_sq ~ inv_gamma(1, 1);
y ~ normal(alpha + x * beta, sigma);
real(0, ) tau = inv(sigma_sq);
If you think of our block headers like data { ... }
as introducing that marker on all the contained variables, then we get backward compatibility. Maintaining two different ways to write Stan programs is probably a bad idea in the long run, though.
Only parameter and data variables are marked as such. Locations of other variables may be inferred:
- variable does not depend on any parameter:
- transformed data
- variable depends on some parameter and the target depends on variable:
- transformed parameter
- variable depends on some parameter and the target does not depend on variable:
- generated quantity
Locally scoped blocks are no problem, their location w.r.t. an original Stan program would be inferred similarly to how variable location is inferred. For example, the following would have to go in the model block:
{
real y = 10;
y ~ normal(mu, sigma);
}
Functions can be defined so that they have access to previously defined variables (this requires finding the right block and using closures to define the function).
A non-centered normal prior could be encoded like this, satisfying desideratum (3).
vector[N] beta = non_centered_normal(N, mu, sigma);
vector non_centered_normal(int N, real mu, real sigma) {
param vector[N] beta_std ~ normal(0, 1);
return mu + sigma * beta_std;
}
Technically, this is tricky because it’s introducing a parameter in the scope of a function, but if the parameters just get read off an iterator that gets passed in here through a closure, it should be no problem assuming there’s no branching that could change number of parameters. But if we do this, it also allows conditional parameters to be defined, and we could even roll it into the language in such a way as to avoid messy scope interpretation:
param x when (cond);
Again, just have to make sure that cond doesn’t depend on parameters. Still not sure I like making parameters conditional because there’s the tricky business of when they get used; they’re only available if the runtime cond
holds.