I thought I’d move this discussion to the dev list.
The standard way to look at this is that you have a parser
and generator function:
p : string -> ast
g : ast -> string
I don’t think requiring the AST to be lossless in the
sense that g is an inverse of p is a useful property.
Instead, p is many to one and the AST is a kind of canonical
form.
We do want the composition
g.p = lambda x. g(p(x)))
to be idemopotent, which we get for free if the AST is really a canonical form in the sense that once we’ve run it through the parser, running it to a string and back to an AST won’t change it
p(s) = p(g(p(s))
We don’t actually have the g function in Stan. Instead, we generate C++. We could write such a g that pretty-printed. It’d be a great project for someone wanting to learn and improve the AST!