Syntax and scope for Stan language includes

Ben used the following in RStan:

functions {
  #include "foo/bar.stan"
}

I propose we allow the following syntax to be used anywhere in a Stan program file but require the @ to be line initial:

system include

@include <gp>

user include

@include foo/bar.stan
  • RStan allowed the # to be indented, but C++ doesn’t. I was thinking of not allowing indentation as it’s much easier to parse out that way and also matches C++/Java conventions. But Stan doesn’t have standalone include units—they have to go in blocks, so this looks more than a bit awkward indentation-wise:
functions {
@include <gp>
@include my/path/gp_fun.stan

  int foo(int x) { return x + 1; }
}  
  • Stan allows # as a comment character, so it seems confusing to also allow it as the include marker.

  • RStan used quotes around the path, but we don’t need to. Neither Java nor Python require quotes; in R they’re optional in library().

  • Python and Java use import whereas C/C++ use include; the latter seems easier to understand to me

  • The system include gets a different syntax because it searches a built-in path

1 Like

RStan is a bit less picky than you describe. The #include statements can be anywhere with any indentation and I don’t think the quotes are required (I was just following C++ style). I think using @ would be fine, but what would the built-in path be for the interfaces?

Thanks for clarifying on RStan. Do you think it’s useful to have includes in places other than functions?

I could skip worrying about system files in the first version and just require the user to include the paths in the call to the compiler (that could be passed in through the interfaces). Then we could distribute system-like libraries on the side rather than through Stan itself.

We fix relative paths in CmdStan (mainly because nobody ever figured out how to make from another directory), so paths to system libraries should be easy. Would something similar be hard in RStan?

Yes. The Stan files in rstanarm are mostly includes these days, such as
https://github.com/stan-dev/rstanarm/blob/master/exec/continuous.stan

We could look for a magic directory somewhere under where the rstan package is installed on disk, but at the moment, the stanc_builder function has an isystem = c(dirname(file), getwd()) argument whose default behavior is to look first under the directory where the Stan file is and then in the working directory.

The @include looks funky to me. Maybe it’s just my C++ affinity. Maybe I can imagine we would want to include some other annotation in the future where @ may come in handy.

Are you avoid #include because # is a comment in the Stan language now?

I think this will be awesome once we have it. Think we’ll allow redundant nested blocks? I’m thinking we might want to include a function from a file that has the function block in there. I can imagine that happening a lot. So when it’s all included, I’m thinking:

functions {
  functions {
    real foo() {
      return 0;
    }
  }
}

Yes, there’s ambiguity with #. I’m going to write this as a preprocessor, though, so I can give the # as include precedence. It just seems awkward because then the doc has to say every time # is mentioned as a comment that it is except when it’s an include.

So if we go with # for includes, I’d be happier deprecating.

But the # will look weird to R people, too. So that is a C++ bias. But then we’ve sort of gone with C++ conventions elsewhere.

I understand the tension between R and C conventions, but that said I would also prefer using # and deprecating its use a comment.

I’m not quite sure what the big deal is. In RStan currently we search for lines starting with #include and then preprocess. If there is a # in any other context, we leave it as a comment. Why can’t stanc do the same?

Because when you start piling on special rules like “It’s a comment unless it’s not” in a long-lived project you end up with R. We should pick one syntax for these non-code annotations and stick with it, or we should make like Python and have imports look like code rather than pre-processor tags.

I’m fine using @include but I don’t think it is worth making tons of existing Stan programs unparseable just because they have a # comment when it is trivial (for both stanc and people) to distinguish #include "utils.stan" from # implies theta ~ normal(mu, sigma)

It is very much not trivial for a huge number of users to make that distinction without a significant amount of frustration.

It’s straightforward to do the whole deprecation song and dance. All users have to do is run an R script or sed over their programs. I don’t think we wan’t to support ‘#’ for comments long-term if we use it for includes. If we leave it as zombie syntax somebody has a #include ... that’s actually a comment and then the program is failing with obscure messages.

I recall the (lack of) discussion around adding #include to rstan. The concern that this is an example of the interfaces driving the core language in an un-maintainable way is as old as the #include syntax: https://groups.google.com/d/msg/stan-dev/q_hGJTq6xug/lRc56qFkCAAJ

Then what is wrong with @include statements and leaving the comments the way they are now?

As an R user, I don’t mind if Stan deprecates # for comments. I notice that Stan manual always uses // etc., never #, so we are consistent with ourselves.

R does not allow blocks of comments. The R commenting syntax is terrible, and I don’t see why we in Stan should be following it in any way.

A