Stan3 - enable underscores in numbers

ahartikainen · November 14, 2018, 8:37pm

Hi, would it be possible to “enable” underscores in numbers, which would be just ignored by the compiler (Stan->C++).

Currently in python this is a nice and handy property.

10000 :: 100000 :: 1000000

vs

10_000 :: 100_000 :: 1_000_000

see https://www.python.org/dev/peps/pep-0515/

Bob_Carpenter · November 19, 2018, 7:13am

We won’t be doing that for Stan 2, but @seantalts is in charge of the Stan 3 language and @Matthijs is building it.

We’d need a fuller spec than this. Can they show up anywhere and in any quantity within floating point and int? Do we allow any of:

10_00
_0_.1_0_________
_0__
_1._e__+32_
3_._1_4_

I think OCaml does the same thing, but I’m not sure the extent of it because I’m reading a textbook that’s understandably remaining a bit vague w.r.t. the spec.

ahartikainen · November 19, 2018, 8:27am

In python 3:

Leading or trailing underscores are not allowed. This also includes before and after decimal . and e in scientific format.

So these won’t be valid in python

_0_.1_0_________
_0__
_1._e__+32_
3_._1_4_

Also, only one underscore is allowed.

Both int and floating point can have underscores in them.

For Stan we could also use C++14 way using

from PEP 515 – Underscores in Numeric Literals | peps.python.org

C++14 introduces apostrophes for grouping (because underscores introduce ambiguity with user-defined literals), which is not considered because of the use in Python’s string literals. [1]

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3499.html

I think common use cases would be

1_000_000
0.123_456
123_456.789_012
0.123_456e+5

Bob_Carpenter · November 19, 2018, 4:37pm

Thanks. The issue should have a clean spec of what is and isn’t allowed in terms of a grammar.

Let’s see what @seantalts has to say before including this. For Stan, the apostrophe is also the unary postfix transposition operator.

ariddell · November 19, 2018, 5:37pm

I use this feature all the time in Python code. It improves readability
a lot. It’s nice to be able to tell at a glance that a number is 100_000
rather than 10_000.

seantalts · November 19, 2018, 6:51pm

I love the underscore separator, I think it would be great to add to Stan 3. Agreed we can’t use the apostrophe. Should be a very simple, backwards-compatible change.

Matthijs · November 20, 2018, 11:46am

It’s very easy for me to implement. Would be a straightforward change in the lexer. It doesn’t even need to touch the parser. Just let me know a precise spec.

seantalts · November 20, 2018, 12:09pm

We’d need to change the parser to emit an AST that captures the underscore so we can pretty-print underscores back out. I think the spec is just that single underscores can appear in the middle of numbers (but not beginning or end, and for decimals not next to the dot) and they do nothing.

Matthijs · November 20, 2018, 12:19pm

The parser would automatically capture them, as the lexer just hands the string that a numeral matches on to the parser.

So: 10__000 is illegal? What about 1_0_0?

ahartikainen · November 20, 2018, 12:23pm

first: Illegal
second: legal

Matthijs · November 20, 2018, 12:25pm

Thanks! And so they’re allowed in both integer and real numerals? What about in exponents if we use this floating point E notation? Like 1_0.7_1341E+33_3?

ahartikainen · November 20, 2018, 12:29pm

Yes, in both real and int

Valid

seantalts · November 20, 2018, 1:26pm

That works too, we’ll have a translation step for literals then later on. Legal

Bob_Carpenter · November 22, 2018, 3:01am

Do you think it’s worthwhile to capture exactly the form of the user input? So we distinguish 1.0 from 1.00 from 1e+0?

I think of these separators as like an internal whitespace.

Here’s the precise spec for OCaml—is there a reason not to just follow that?

https://caml.inria.fr/pub/docs/manual-ocaml/lex.html

seantalts · November 29, 2018, 3:54pm

I think it might be worthwhile - users may have reasons to prefer certain notations.

Topic		Replies	Views
Syntax and scope for Stan language includes Developers	13	1596	February 7, 2017
What is the supported external API of the Math Library? Developers	46	1937	May 3, 2019
New array syntax might mean Stan language 3.0? Developers	23	1970	September 2, 2020
Is there something like #define in Stan? Modeling	2	1118	June 12, 2017
Integer division Modeling	2	1393	April 3, 2020

Stan3 - enable underscores in numbers

Related topics