Dropping ~ notation?

There’s been a lot of discussion about dropping the ~ notation from Stan because it confuses our users, who often think it literally does a random draw (presumably they’re equally confused about BUGS, etc., which just uses the ~ to define a directed graphical model).

The main obstacles in my mind to getting rid of it (that is, deprecating it and eventually removing it) is that

  1. there are a lot of programs using it

  2. until we get a replacement that drops constants, it’ll slow things down

I’d be very opposed to getting rid of it before solving (2). Then everything will have to look like:

target += normal_lpdf_drop_constants(y | mu, sigma);

[There’s no other relevant category I can find other than “General” that will let users respond to posts that are essentially about dev topics.]

I would add 2.5 that there are some model comparison things that people are wont to do that require the constants be included. Thus, if they write things like target += normal_lkernel(y | mu, sigma); in their Stan programs and then call (usually indirectly) log_prob afterward, they will get the wrong answer. Although this would be their own fault, it is an easy mistake to make. That said, I have long been in favor of getting rid of the ~ notation.

The one significant appeal of the ~ notation is that it mirrors the mathematical notation for building a generative model. In other words, the Stan program ends up looking like the paper. Now this might be applicable only to those people who already understand the papers well enough for this to be useful, but for a few of us and our close colleagues there are strong selection effects towards that.

On the other hand, I’d be fine with the ~ notation as downloadable content (DLC) that users have to unlock by doing sufficient training. ;-)

2 Likes

Your model will sample starting in 1 hour. If you’d like to get VIP sampling privileges, you can purchase 1000 Amazon Coins and never have to wait for sampling to start again. Or unlock VIP privileges by solving this modeling problem and level up!

1 Like

The obvious problem is that the thing it mirrors is not what it does. That’s the confusion. And because it’s apparently not possible to have a strict mode parser that only lets you use ~ once per variable (I say apparently because I will never understand this stuff, but this is what I understood from Bob), the confusion will always be there.

  • I don’t love target += ... as it’s much faster to just say x ~ .... Is there a different symbol that could be used?
  • Being of the opinion that marginal likelihoods are next to useless, I’m less concerned than Ben about the normalising constant issues. For some models (I’m thinking about ICARs in particular) it will be very expensive to compute.
  • For most users, ~ actually is assignment. It’s only not that when you further update the log-target in some way (but using ~ twice or doing a sneaky target+=). So is this confusion really that big a deal, or are we just seeing it occasionally?

I agree that the focus here should be interpretational – we can always add new language features to flip normalization on and off.

The appeal of ~ is that it is a super clean notation of “is marginally distributed as”, or at least is probably that. As Bob notes we can’t guarantee that we can identify ~ or even target += being called twice on the same parameter because the language allows loops and conditionals (that might take infinite time to resolve at runtime). We could add heuristics to check for obvious cases like

x ~ normal(0, 1); x ~ normal(0, 1);

and maybe even common mistakes like

real x[N];
...
for (n in 1:N)
  x ~ normal(0, 1);

but they wouldn’t catch every possible corruption that would render “x is marginal distributed according to some distribution” as incorrect.

I dunno. I think that a lint-like parser with a ton of heuristic checks would catch 99.9% of the common mistakes that new users make, which might be enough of a teaching opportunity. I also fear that if we remove ~ entirely then possible converts from BUGS and JAGS will find the language so foreign that they don’t make the leap, although this would be just another speculation.

Not so easily. The reason things are like they are now is that mixture models need the normalizations, whereas normal computations don’t. If we don’t allow finer grained control than all-on or all-off, then this won’t work.

Correct. That’s what I was trying to say—it’s impossible because it’s an undecidable problem with a Turing-equivalent language. But we can use heuristics.

We really need to get started on the Stan-lint (or Stan pedantic mode or whatever we want to call it).

1 Like