Illegal statement beginning with non-void expression parsed as not a legal assignment, sampling, or function statement

Dear Stan Community,

I am trying to use the horseshoe model from Piironen and Vehtari (2016). The paper was made in STAN and it provides a code in the end. I am attaching a copy of the code below (I only did a minor modification replacing all the <- for =, since <- since it’s deprecated).

When I paste this code in my Rstudio console I get the following error next to the prior distribution of z:

"Illegal statement beginning with non-void expression parsed as
Not a legal assignment, sampling, or function statement. Note that
PARSER EXPECTED: “}” "

It seems the problem is in the function chunk and not in the z prior distribution (which is straightforward). When I erase the function the error disappears, but obviously, the code claims that the function is missing.

If I try to run the code (for example in data_example.csv (7.2 KB) ) to some simulated data I get:

“Error in stanc(file = file, model_code = model_code, model_name = model_name, : failed to parse Stan model ‘stan_code’ due to the above error.”

After reading about functions, I don’t find what is wrong with that function or the code … specially since it seem it worked when the authors wrote the paper. Could not find guidance in other posts with similar problems.

Any ideas of what could be happening?

Thank you for your help.
Best wishes,
Nico.


Piironen and Vehtari code

functions {
vector sqrt_vec ( vector x) {
/* * element - wise square root */
vector [ dims ( x )[1]] res ;
for (m in 1: dims (x )[1]) res [m] = sqrt (x[m ]);
return res ;
}
}
data {
int < lower =0 > n ; // number of observations
int < lower =0 > d ; // number of predictors
vector [ n] y; // outputs
matrix [n ,d] x; // inputs
real < lower =0 > scale_icept ; // prior std for the intercept
real < lower =0 > scale_global ; // scale for the half - t prior for tau
// ( tau0 = scale_global * sigma )
real < lower =1 > nu_global ; // degrees of freedom for the half - t prior for tau
real < lower =1 > nu_local ; // degrees of freedom for the half - t priors for lambdas
// ( nu_local = 1 corresponds to the horseshoe )
}
parameters {
real w0 ; // intercept
real logsigma ; // log of noise std
// auxiliary variables that define the global and local parameters
vector [ d] z;
real < lower =0 > r1_global ;
real < lower =0 > r2_global ;
vector < lower =0 >[ d] r1_local ;
vector < lower =0 >[ d] r2_local ;
}
transformed parameters {
real < lower =0 > tau ; // global shrinkage parameter
vector < lower =0 >[ d] lambda ; // local shrinkage parameters
vector [ d] w; // regression coefficients
vector [ n] f; // latent values
real sigma ; // noise std
sigma = exp ( logsigma );
lambda = r1_local .* sqrt_vec ( r2_local );
tau = r1_global * sqrt ( r2_global );
w = z .* lambda * tau ;
f = w0 + x *w;
}
model {
// half - t priors for lambdas
z ∼ normal (0 , 1);
r1_local ∼ normal (0.0 , 1.0);
r2_local ∼ inv_gamma (0.5* nu_local , 0.5* nu_local );
// half - t prior for tau
r1_global ∼ normal (0.0 , scale_global * sigma );
r2_global ∼ inv_gamma (0.5* nu_global , 0.5* nu_global );
// gaussian prior for the intercept
w0 ∼ normal (0 , scale_icept );
// observation model
y ∼ normal (f , sigma );
}

I think copy pasting was the problem. I retyped the sampling statements and the ~ in your document are not recognised by rstudio, probably an encoding thing. If you retype all tildes in the code with the ~, I think it’s going to work.

5 Likes

Thank you so much @stijn . I never imagine there could exit that difference between ∼ and ~ . I did not get why the error was next to that prior … Got the lesson.

I guess it would have been easier to get it if the error would have been in every of the “∼” places instead of just one place. (or have the warning repeated many times instead of just one time)

1 Like

copy and paste errors and wrong character encodings are a real bugbear for the parser.

@Bob_Carpenter and @seantalts - do we have a definition of allowed characters in variable names?

would it be useful to do have the compiler do a pre-scan pass to validate characters, line endings, in order to flag corruptions introduced by copy/paste/R-studio? alternatively, can this be plumbed into RStudio’s Stan syntax checker?

2 Likes

@mitzimorris We need as much help testing the new compiler as we can get - this is a great opportunity to try out the new compiler and see what the error message it generates is. This is the new error message - @Nico-Rojas would this have been more clear to you?


--- Translating Stan model to C++ code ---
bin/stanc  --o=vehtaricode.hpp vehtaricode.stan

Syntax error in 'vehtaricode.stan', line 45, column 1, lexing error:
   -------------------------------------------------
    43:  model {
    44:  // half - t priors for lambdas
    45:  z ∼ normal (0 , 1);
           ^
    46:  r1_local ∼ normal (0.0 , 1.0);
    47:  r2_local ∼ inv_gamma (0.5* nu_local , 0.5* nu_local );
   -------------------------------------------------

Invalid character found.
3 Likes

@seantalts, to me, “Invalid character found” is a great improvement compared to:

"Illegal statement beginning with non-void expression parsed as Not a legal assignment, sampling, or function statement. Note that PARSER EXPECTED: “}” "

However, something that also misled may a lot was the fact that the invalid character was at least 7 times in the code whereas the error only showed up the problem once, indicating a specific line (z ∼ normal (0 , 1);) that showed something hard to identify. If I had had the opportunity to know that “7 invalid characters” where in the code and/or indicating all the lines with problems, it would have been easier for me to debug.

Also, in this case, ∼ is very hard to distinguish from ~ . Maybe a warning about copy-paste or even giving this as an example could also help (Although, it could be the case that not that many people are doing this kind of thing …)

1 Like

Perhaps mention in the error message that the character may be in an unexpected encoding?

the chacter itself, and position in line?

@mitzimorris

Just in case this helps debugging and error messages in the future. Here is how I found the problem in abbreviated form.

  • I commented out the function block and everything depending on it.
  • the error message has some more information about potential problems. One of the options indicated that the line was not recognised as a sampling statement.
  • I commented out the offending line and the error just popped up again on the next line.
  • I opened another Stan program and saw that ‘normal’ was highlighted in the new file but not in the file with the error. I retyped the offending line and it was highlighting ‘normal’
2 Likes