The var keyword: should Stan support local variable type inference

This was discussed in today’s Stan meeting, and we decided it would be good to discuss on the forums.

The proposal is to include something similar to the Java var keyword to allow local variables to have their type inferred rather than declared. For variables declared in the transformed data, transformed parameters, model, or generated quantities block, their type is almost always blindingly obvious to the compiler. For example,

data {
   int N;
   vector[N] data;
   vector[N] predictor;
}

parameters {
    real alpha;
    real beta;
}

transformed parameters {
    // could be var mu = alpha + beta * predictor; ?
    vector[N] mu = alpha + beta * predictor; 
}

model {...}

In this snippet, mu has a complete unambiguous type due to its lvalue. Right now, type checking would verify that the assignment matches the declaration type. With var, the only change is that the typing step would just take whatever type it found on the left hand side and treat mu as that type going forward. This would be a language/compiler only change - the generated c++ would stay the same, and the typing semantics would be the same – mu still has type vector[N], you just didn’t have to write it down.

Restrictions

  • Items declared with this syntax would need to be declare-define. i.e., var mu; is not allowed, there must be an lvalue present. This restriction is what reduces the problem from something like OCaml’s type inference algorithm to something much simpler.

Advantages

  • Especially with later language extensions (tuples or function types), writing down the type can be increasingly verbose even as the type is obvious. (1, 2.3, 1, 3.3) has a type (int, real, int, real), and one can imagine even longer types.
  • This could potentially dramatically reduce the number of types some users need to think of in their heads. When it is combined with the automatic vectorization already done by Stan, a large number of models would probably only need types provided in the data and parameters block, with all other values being var-ible and the user not needing to worry about the specifics.

Concerns

  • Type inference is another very programming-language-y feature that some users would likely not understand. In particular, great care would need to be taken for error messages. As @bgoodri pointed out during the meeting, this kind of thing moves errors down in a program - if you have an expression you think is a vector but is actually a real array, using var moves that error from the declaration to the use of the variable.
  • @jonah pointed out that a lot of times, you need to know the intermediate value’s type anyway to use it in a later expression, so var is of questionable benefit. This is true, though language features (like tuples) with easy-to-understand but complicated-to-write-down types do still create a place for this, and automatic vectorization does help blur the lines and allow you to not think so much about the exact type.

I’d love to hear if anyone else has ideas, suggestions, or thoughts on this. If it seems like a worthwhile feature for Stan, I’ll take the feedback here and draft a design document. If it seems like it wouldn’t get that much use, we can spend dev time other places.

5 Likes

I’m in favor of anything that makes the syntax more intuitive or moves it closer to SlicStan. var sounds great.