‘Pedantic mode’ will be a compiler flag which disallows (or warns against) questionable patterns in the source code, in the hopes of preventing programs with undesired behavior from being compiled. This will be akin to the
-Wall (and possibly
-Werror) flags from compilers like GCC.
One example of a such a pattern which can already be caught in Stan 3 is reading from an uninitialized variable. The Stan language allows reading from uninitialized variables, but it’s not advisable, so pedantic mode should catch it. Another example might be using parameters in control flow predicates.
What are some other patterns you would like pedantic mode to catch? Let’s try to collect a wishlist here. I can work on implementing the consensus into Stanc3.
Tagging some interested potentially interested parties -
@andrewgelman @Bob_Carpenter @betanalpha @bgoodri @syclik @seantalts @Matthijs @enetsee
There are a lot of things that are inefficient, that we could warn about if we can’t automatically transform to the efficient version. But is pedantic mode intended to mention things that are inefficient, or just things that are likely wrong? What about likely numerical instability?
My preference would be for pedantic mode to focus on preventing incorrect posteriors, which would cover numerical instability but not inefficiency. I do think we should point out inefficient patterns with other warnings.
That sounds great! To start with, take a look at the Stan Parser Pedantic Mode wiki that we put together a few months ago:
We have a lot there already. As we wrote, these are not syntax errors; rather, they’re errors of programming or statistical practice that we would like to flag.
Just to throw out some ideas, in no particular order:
using the same var as lhs in two twiddle statements
having a parameter that is not used anywhere should probably be an error
see if we can partially evaluate and maybe infer sizes, and warn when someone loops over the wrong size for some var? Also some functions like rep_vector could propagate size information and be checked at compile time
extend uninitialized variables (and other analyses) to array elements
warn against using cauchy as prior for Variance parameters (and other prior specific stuff according to the wiki page for prior recommendations)
x ~ some_dist() or
target += some_dist(x...)) where x is some non-linearly transformed parameter from either the transformed parameters block or model block - could check or warn that it needs a jacobian adjustment. Often people write stuff like
log(y) ~ normal... and they need to adjust for the change of variables there.
I think this is somewhere in Andrew’s linked wiki but one thing that would be super helpful in pedantic model is a message if any parameter with a name containing “sigma” (upper or lower case) is declared without
lower=0. This will result in some false positives (I’ve seen people use sigma for all sorts of things), but I think the number of mistakes it will catch will be much higher.
As a logsigma fan, that sounds annoying :)
It sounds like we want multiple pedantic flags like
-Winefficient would be for inefficient things,
-Wincorrect for flagging things that can hurt numerical stability, flags for other stuff, and then a
-Wall for everything
@Charles_Driver good point! Maybe that can be solved by something like @stevebronder’s idea
Hi all. Please read the pedantic Stan wiki, as I did put some thought into this!
In particular, regarding the “sigma” thing, here was my proposal:
- Any parameter whose name begins with “sigma” should have “lower=0” in its declaration; otherwise flag.
Warning message: “Warning: On line ***, your Stan program has a unconstrained parameter with a name beginning with “sigma”. Parameters with this name are typically scale parameters and constrained to be positive. If this parameter is indeed a scale (or standard deviation or variance) parameter, add lower=0 to its declaration.”
This going to be the easiest way forwards. There are just too many, inconsistent definitions of “pedantic” to have a self-consistent set of checks.
For example checks based on naming conventions are going to be dangerous when jumping between fields where those name conventions no longer hold.
In some sense the parser/compiler is the pedantic mode, pedantically allowing any program that defines a valid probabilistic program (for the Stan definition of probabilistic program defining a log joint density). Beyond that is heuristics that attempt to identify causes of not a user specifying an invalid program but rather a different program than what they wanted. Categorizing these heuristics will help to avoid getting bogged down in differing opinions while also communicating to the user what is being assumed in the generation of the warning.
One mode could be based on graphical model heuristics, checking that every parameter is assigned a single prior density (i.e. the variable is on the left hand side of one and only one density increment). Within this mode we could also check those densities against the constraints of the parameters and flag inconsistencies.
One mode could be based on programming heuristics such as not using an undefined variable or, perhaps more important, not explicitly assigning values to local variables. This is a super common mistake when people type “variable[N] = 1” instead of “variable[n] = 1” in a loop. Another check would be when vectorization is used within a loop to try and catch accidentally redundant priors.
One mode could subsume the Jacobian warning that we already have.
Perhaps one mode based on looking for inefficient patterns? Such as vectorization opportunities, unneeded matrix inversions, etc.
Beyond that we get into murkier territory, including modeling recommendations (prior choices) and naming conventions.
Was talking to @paul.buerkner a while back about his bayesian methodology package. We joked but actually ended up liking the idea of having an option users could set to get tips from certain people. So I like the idea of having
Could also have an option
-Beer3 where the suggestions become more and more opinionated
Hi all. Let me just ask people with ideas in this area (including everyone who’s posted on the thread) to add their ideas to the wiki: https://github.com/stan-dev/stan/wiki/Stan-Parser-Pedantic-Mode as this may help us moving forward.
Regarding different ideas of what should be recommended: I think it could make sense to catch all these things in pedantic mode.
There should also be a setting to turn such warnings off: For just about everything that’s not actually a syntax error, we can probably find a use case where the anti-recommended behavior actually makes sense (if for no other reason than demonstration purposes in a robustness study).
So “pedantic mode” should not actually forbid anything that compiles; it should just give a warning (e.g., the one-paragraph warnings suggested for some of the examples on the pedantic mode wiki).
I’m guessing that, in the vast majority of cases, the warnings will represent typos, copy-paste errors, or other mental lapses (for example, forgetting to put <lower=0> in the declaration of a variance parameter is not a typo, but it’s probably a mistake); conceptual errors that the user wasn’t aware were conceptual errors (for example, giving a parameter a uniform(-10, 10) prior distribution); and inefficient coding. I’m guessing that most users will be best served by a pedantic parser that catches all these things and more, as long as users can then flag the items in the program that they wants to keep so they won’t keep getting warnings they have to ignore.
To put it another way, different power-users have noticed different common errors in Stan code, and I think we’d be doing people (including ourselves) a service by catching as many of these as we can.
Pedantic modes are always annoying.
Right. Which is why the
sigma thing is very weak. That’s Andrew’s conventions for naming variables, e.g.,
y ~ normal(mu_y, sigma_y);
but I don’t know that it’s widely used. I find it gets too cluttered doing this in most serious models.
These are already on the list.
Agreed. I’m not so keen to undertake these.
If there are too many irrelevant warnings (what someone other than Andrew might call a “false positive”), tools like this become unusable.
I agree. But I don’t think any of the items on that list are irrelevant! One approach would be to put them all in, and then get feedback from users. Just for example, I think it will be pretty rare that people will have parameters with names beginning with the strings “sigma” or “tau” that should not be constrained to be positive.
What I wrote in the wiki was: “Any parameter whose name begins with ‘sigma’ should have ‘lower=0’ in its declaration; otherwise flag.”
Read carefully: I said “name begins with ‘sigma’”. Not “name includes ‘sigma’”. So a parameter called logsigma would not be flagged.
Could you flesh out some of these ideas and add them to the wiki? That would help, as it’s easier to keep track of a wiki than a discussion thread. The wiki can be used as a starting point for the pedantic parser, and also it can be a useful resource for users who might want to get more background on the motivations for the parser choices.
Let’s take the Jacobian warning for example. If we can’t figure out that a transform is linear, we throw the warning. But sometimes it’s a false positive.
Similarly, I’m guessing people will use sigma and rho for things other than scales or variances, in which case, the warning will be a false positive.
Similarly, if we flag double uses of a variable on the left-hand side of a twiddle, that may be intentional, so the warning would be a false positive.
When too many false positives stack up, people lose faith in a diagnostic tool and stop using it.
Regarding sigma and rho: Maybe you’re right. I dunno. I think it would be good to put it in, and then we can see if any users are tripped up by it. I’m guessing that most of the time people will want constraints on these parameters. But, sure, that’s just my guess.
Regarding double uses of a variable on the left-hand side of a twiddle: I agree that sometimes people will do this on purpose, but (a) I’m guessing that this is rare, and (b) I’m also guessing that when people do this, they know that they’re being somewhat nonstandard and so maybe they won’t mind the flag.
Overall I understand that there’s a tradeoff, and I’m thinking that the items on the list are items where the vast majority of the time when they are flagged, they will represent real problems or oversights. Also, we will have the option to turn the flag off.