Is there a R Stan Code parser?

I’m currently working on a package that dynamically builds Stan models in R. Part of the package design allows users to provide Stan code snippets that will be mangled into a larger model. I was wondering though if there already exists a stan-code parser? That is I was hoping to take something like:

snippet.stan

functions {
    vector get_z_score(vector x, vector pars){
        return (x - pars[1]) / pars[2];
    }
}
data { int<lower=1> n; vector[n] x; } parameters { real mu_x; real<lower=0.00000001> sigma_x; }
model { target += normal_lpdf(x | mu_x, sigma_x); }

And get:

parse_stan_file("snippet.stan")

> list(
    "functions"  = "    vector get_z_score(vector x, vector pars){\n    return (x - pars[1]) / pars[2];\n  }",
    "data" = "int<lower=1> n; vector[n] x;"
    "parameters" = "real mu_x; real<lower=0.00000001> sigma_x;"
    "model" = "target += normal_lpdf(x | mu_x, sigma_x);"
)

Or some form of similar functionality. I appreciate I can just write my own parser but was hoping to avoid re-inventing the wheel if such functionality has already been implemented elsewhere.

Stan’s parser is written in OCaml and available through the stanc3 repository of stan-dev. The full syntax for the language is available in the Reference Manual chapter on syntax.

If you want something like autocomplete, there are various implementations of that for different syntax highlighters, but I don’t know much about those because they’re broken out by system. For example, here’s one @WardBrian put together for highlighting Stan code in Jupyterlab.

1 Like

Thank you, I wasn’t aware of stanc. This helps a lot !

In my particular case I don’t need to extract anything more than the individual “blocks” as complete strings. In particular for me I think that the following does exactly what I think I need as it formats the code consistently enough for me to accurately extract the individual blocks as strings.

/Users/gowerc/.cmdstan/cmdstan-2.33.1/bin/stanc --print-canonical ./local/model.stan

@Bob_Carpenter - One issue we are running into with this approach is that some of the file fragments contain variables that are defined in other file fragments which causes the the parser to throw an error:

Semantic error in '/var/folders/hs/gyg0q5g94pz917klnkg7tnt80000gq/T//Rtmp2qrgtf/filef6e724b5c024.stan', line 11, column 11 to column 17:
   -------------------------------------------------
     9:  
    10:      real lm_rs_intercept;
    11:      array [n_arms] real lm_rs_slope_mu;
                    ^
    12:      real<lower=1.4901e-08> lm_rs_slope_sigma;
    13:      real<lower=1.4901e-08> lm_rs_sigma;
   -------------------------------------------------

Identifier 'n_arms' not in scope.

Is there any option to suppress this check? Only thing I could see was --allow-undefined but this appears to only be for functions. I guess I’m asking if there is any equivalent of extern int x for stan or anyway we can say that we promise this variable will exist in the final program.