Stan3 - enable underscores in numbers

Hi, would it be possible to “enable” underscores in numbers, which would be just ignored by the compiler (Stan->C++).

Currently in python this is a nice and handy property.

10000 :: 100000 :: 1000000

vs

10_000 :: 100_000 :: 1_000_000

see https://www.python.org/dev/peps/pep-0515/

2 Likes

We won’t be doing that for Stan 2, but @seantalts is in charge of the Stan 3 language and @Matthijs is building it.

We’d need a fuller spec than this. Can they show up anywhere and in any quantity within floating point and int? Do we allow any of:

10_00
_0_.1_0_________
_0__
_1._e__+32_
3_._1_4_

I think OCaml does the same thing, but I’m not sure the extent of it because I’m reading a textbook that’s understandably remaining a bit vague w.r.t. the spec.

In python 3:

Leading or trailing underscores are not allowed. This also includes before and after decimal . and e in scientific format.

So these won’t be valid in python

_0_.1_0_________
_0__
_1._e__+32_
3_._1_4_

Also, only one underscore is allowed.

Both int and floating point can have underscores in them.

For Stan we could also use C++14 way using

from PEP 515 – Underscores in Numeric Literals | peps.python.org

C++14 introduces apostrophes for grouping (because underscores introduce ambiguity with user-defined literals), which is not considered because of the use in Python’s string literals. [1]

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3499.html

I think common use cases would be

1_000_000
0.123_456
123_456.789_012
0.123_456e+5

Thanks. The issue should have a clean spec of what is and isn’t allowed in terms of a grammar.

Let’s see what @seantalts has to say before including this. For Stan, the apostrophe is also the unary postfix transposition operator.

I use this feature all the time in Python code. It improves readability
a lot. It’s nice to be able to tell at a glance that a number is 100_000
rather than 10_000.

I love the underscore separator, I think it would be great to add to Stan 3. Agreed we can’t use the apostrophe. Should be a very simple, backwards-compatible change.

2 Likes

It’s very easy for me to implement. Would be a straightforward change in the lexer. It doesn’t even need to touch the parser. Just let me know a precise spec.

We’d need to change the parser to emit an AST that captures the underscore so we can pretty-print underscores back out. I think the spec is just that single underscores can appear in the middle of numbers (but not beginning or end, and for decimals not next to the dot) and they do nothing.

1 Like

The parser would automatically capture them, as the lexer just hands the string that a numeral matches on to the parser.

So: 10__000 is illegal? What about 1_0_0?

first: Illegal
second: legal

Thanks! And so they’re allowed in both integer and real numerals? What about in exponents if we use this floating point E notation? Like 1_0.7_1341E+33_3?

Yes, in both real and int

Valid

1 Like

That works too, we’ll have a translation step for literals then later on. Legal

Do you think it’s worthwhile to capture exactly the form of the user input? So we distinguish 1.0 from 1.00 from 1e+0?

I think of these separators as like an internal whitespace.

Here’s the precise spec for OCaml—is there a reason not to just follow that?

https://caml.inria.fr/pub/docs/manual-ocaml/lex.html

1 Like

I think it might be worthwhile - users may have reasons to prefer certain notations.