Structured error output format for stanc?

In short, I’m wondering whether it is possible to get a machine-readable structured error output (e.g., JSON, XML, etc) from the current stanc or a future version (stanc3).

Long version:

I’m currently trying to update the emacs stan-mode to the 2019 standards. One of the new feature is on-the-fly syntax check using flycheck (flycheck-stan. The example picture is a code with a single syntax error in cauchy_lpdf, which the on-the-fly syntax checker underlined.

But I’m having a difficulty parsing stanc error output for this. When there is only one error or info, it is fine. But when there are multiple info like deprecated syntax and an error, parsing them separately becomes little trickier.

Example:

Issues

  • Info’s do not have associated line numbers, making it difficulty to indicate it live
  • An error typically starts with " error" with associated line and column (some times just line). But in this case, the explanation “Probability function…” comes before “error…” making it difficult to capture its beginning unless all possible messages are encoded in the parser.

Ideal output

  • Machine-readable format (e.g., --json option gives the errors in json)
  • Data elements include: line, column, severity (error, warning, info, etc), one-liner error/info message, long error/info body for full description
  • If machine-readable format is too far-fetched, a consistent beginning (Error: in addition to Info:), consistent line number, one liner summary before the multi-line message body can make parsing much easier.
3 Likes

I had the same problem when I had a go at adding flycheck support and an Atom lintr.

1 Like

@jrnold, thanks for your comment. Is the logic for the Aom lintr available somewhere?

stan/src/stan/lang/grammars/semantic_actions_def.cpp seems to be where these messages are generated. If messages with pass = false all have the same prefix like “Error:” then it will be much easier.

Oh man, I didn’t see this thread! @enetsee and @Matthijs, @kaz-yos here is looking for structured output for the purposes of presentation compilation :)

We were thinking about eventually building something like language server protocol into stanc3: https://github.com/stan-dev/stanc3/issues/65

We have a super vague sketch of an idea for how that could happen using Menhir but haven’t had the time to work on it yet, and didn’t realize there were folks in the community who could make use of it already.

2 Likes

I am documenting what I wrote elsewhere. I solve the issue as follows for now:

flycheck-stan picks up all “Info:…” patterns and error patterns listed here.

These were extracted from semantic_actions_def.cpp by some basic pattern matching.

Fortunately, flycheck is smart enough to warn you even if the patterns are missed because it pick up the non-zero exit code from stanc.

I haven’t implemented support for stanc3, but the corresponding OCaml code may be easier for message extraction.

Thank for the information on the potential future directions!

Of course! Would you be interested at all in working on a Stan Language Server that can integrate with something like https://github.com/emacs-lsp/lsp-mode ? We hear it is the new hotness w.r.t. IDE language support - should allow for error checking, completions, and definitions at least. One time we analyzed what we thought a minimal supporting API would look like and wrote about it here: https://github.com/stan-dev/stanc3/issues/94

We’d be happy to help you get started with OCaml - it’s a lot like a LISP, actually!

1 Like

I am revisiting this now after a couple of weeks of interruption due to grant deadlines. In the latest develop branch of cmdstan (commit 991360f). stanc gives more structured output. Is this already stanc3? Here I can just capture “Warning:” and “Syntax error” to split up the output into individual messages, which is easier.

2 Likes

Yes, develop cmdstan has stanc3 as of end of October.

3 Likes

Thanks for your information!

Hi @seantalts and @rok_cesnovar,

I am currently examining the error patterns in stanc3.

e.g. , https://github.com/kaz-yos/stan-mode/blob/develop/flycheck-stan/examples/example_info_composite_with_error.stanc3out.txt

They are very beautiful in that each individual message has file name, line, column, and extract associated with it!

As far as I can see in my example bad stan files and the expected file in stanc3, there are three types of openings for individual messages.

  • Warning:
  • Syntax error in
  • Semantic error in

Are these exhaustive? Is it correct, the first “Warning” is non-fatal and the latter two “errors” are fatal (not executable).

Also, are these correct as the location in stanc3 code base where these message templates are defined?

Full disclaimer: I dont have an exhaust knowledge of the compiler, hopefully one of the people running the compiler development (@seantalts, @Matthijs, @enetsee, et al) will be able to confirm or correct me and add to my words.

I think you are more or less correct. There are three types of syntax errors (parsing error, lexing error and include error). I think the link you posted for syntax errors is pointing to a different location. Then as you note we have warnings and semantic erros. There is also fatal error that is something that should not happen and is probably a compiler bug if it does.

2 Likes

That sounds right to me.

1 Like

Thanks! I’ll work on flycheck-stan based on these.

2 Likes

The warnings can now be correctly highlighted (underscores) in the developmental version of flycheck-stan!

2 Likes

Fantastic stuff @kaz-yos!

1 Like