Array function coding error for multlevel logit model with multivariate priors

Dear all Stanians,

Hope this message finds you all well. I am writing to ask a question about using array in Stan code blocks. I am trying to estimate a two-level logit model with intercept and slopes as outcomes. Thus I need to have a multivariate prior for the random components at level-2. But I am running into problems with lines beginning with the array function. I pretty much followed the codes at the following Stan project site for manuals:

And my Stan codes are as follow with slight revision (logit instead of OLS) to the codes in the Stan manual. It appears that it doesn’t like this part (Line 5 in the data block): array[N] int<lower=1, upper=J> jj;

Somehow it doesn’t recognize array to begin this line. I also tried to turn the jj part to vector. It runs through this line, but when it hits any line that begins with array, the program always gives me error. I tried turning some of the lines beginning with array (e.g., “array[J] row_vector[L] u;”) to matrix, but it still didn’t work. Any pointer will be great appreciated.

Jun Xu, PhD
Professor
Department of Sociology
Data Analytics Program
Ball State University
Muncie, IN 47306

data {
  int<lower=0> N;                     // num individuals
  int<lower=1> K;                     // num ind predictors
  int<lower=1> J;                     // num groups
  int<lower=1> L;                     // num group predictors
  array[N] int<lower=1, upper=J> jj;  // group for individual
  matrix[N, K] x;                     // individual predictors
  array[J] row_vector[L] u;           // group predictors
  vector[N] y;                        // outcomes
}

parameters {
  corr_matrix[K] Omega;        // prior correlation
  vector<lower=0>[K] tau;      // prior scale
  matrix[L, K] gamma;          // group coeffs
  array[J] vector[K] beta;     // indiv coeffs by group
  real<lower=0> sigma;         // prediction error scale
}

model {
  tau ~ cauchy(0, 2.5);
  Omega ~ lkj_corr(2);
  to_vector(gamma) ~ normal(0, 5);
  {
    array[J] row_vector[K] u_gamma;
    for (j in 1:J) {
      u_gamma[j] = u[j] * gamma;
    }
    beta ~ multi_normal(u_gamma, quad_form_diag(Omega, tau));
  }
  for (n in 1:N) {
    y[n] ~ bernoulli(inv_logit(x[n] * beta[jj[n]]));
  }
}

and the error message is


SYNTAX ERROR, MESSAGE(S) FROM PARSER:
error in ‘model2a4410413ab4_60f1e9e0a72498a2488a08151e849344’ at line 7, column 2

 5:   int<lower=1> J;                     // num groups
 6:   int<lower=1> L;                     // num group predictors
 7:   array[N] int<lower=1, upper=J> jj;  // group for individual
     ^
 8:   matrix[N, K] x;                     // individual predictors

PARSER EXPECTED: <one of the following:
a variable declaration, beginning with type,
(int, real, vector, row_vector, matrix, unit_vector,
simplex, ordered, positive_ordered,
corr_matrix, cov_matrix,
cholesky_corr, cholesky_cov
or ‘}’ to close variable declarations>
Error in stanc(file = file, model_code = model_code, model_name = model_name, :
failed to parse Stan model ‘60f1e9e0a72498a2488a08151e849344’ due to the above error.


You need to update your RStan install; array is a syntax that is newer but should be supported by the latest RStan.

Thanks a lot for your suggestion. I just updated rstan, and it gives me the same error.

Jun

Does rstan even support array? CRAN version is still at 2.21 which is before array was introduced.

If none works out, you can fall back to the old syntax.

Oops, sorry, yes Rstan on CRAN isn’t up to date with the new array syntax. Sorry, I had things confused with cmdstanr

There are ways to use newer versions of RStan, see Rstan Versioning - #2 by rok_cesnovar

1 Like

Thank you all for your pointers. I think I would back off from the new coding style. I have two questions. The first is why we need array since we have matrix. Is array designed for easy looping? Any pointer to the difference between the two and when to use arrays in Stan. Second, a specific question,

So here is what I did: I changed all new arrays to old ones, including

change

array[N] int<lower=1, upper=J> jj;

to

int<lower=1, upper=J> jj[N];

change

array[j] vector[K] beta;

to

vector[K] beta[J];

change

array[J] row_vector[K] u_gamma;

to

row_vector[K] u_gamma[J];

Now, I am getting this error message, and I believe this is related to the array function

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for: 

  real ~ bernoulli_logit(real)

Available argument signatures for bernoulli_logit:

  int ~ bernoulli_logit(real)
  int ~ bernoulli_logit(real[ ])
  int ~ bernoulli_logit(vector)
  int ~ bernoulli_logit(row_vector)
  int[ ] ~ bernoulli_logit(real)
  int[ ] ~ bernoulli_logit(real[ ])
  int[ ] ~ bernoulli_logit(vector)
  int[ ] ~ bernoulli_logit(row_vector)

Real return type required for probability function.
 error in 'model148022256544_6b2410f576b17d23cb28601b40359596' at line 33, column 46
  -------------------------------------------------
    31:   }
    32:   for (n in 1:N) {
    33:    y[n] ~ bernoulli_logit(x[n] * beta[jj[n]]);
                                                     ^
    34:   }
  -------------------------------------------------

Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model '6b2410f576b17d23cb28601b40359596' due to the above error.

Now I’ve uploaded my data, R file, and error message. Would very much appreciate any pointer. Note that I also updated rstanarm to its most current version. What I am trying to estimate is a two-level logit regression. The binary response variable is heart, and individual level predictors are age10, male, rBlack, lthischl, highschl, and group/county level predictors include ctymdinK (county median household income), and ctygini (gini coefficient), and level-2 identifier is ctyfips.

And the error now becomes:

Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  0

Semantic error in 'string', line 30, column 3 to column 46:

Ill-typed arguments to '~' statement. No distribution 'bernoulli_logit' was found with the correct signature.

mydta.RData (16.3 KB)
stanDiscoursePost20220304.R (2.9 KB)

I think bernoulli_logit is another feature that’s newer than your RStan version supports. You’ll have to use inv_logit()manually instead.

Hi Mike,

Thanks a lot for your pointer. First, I updated rstanarm, and then I also tried to use old syntax like the following

  for (n in 1:N) {
    real eta;
    real prob;
    eta <- x[n] * beta[jj[n]]; 
    prob <- 1/(1+exp(-1*eta));  
    y[n] ~ bernoulli(prob);
 }

It still doesn’t work.

Jun

What’s the error msg now?

Thanks a lot for your follow-up! The error message is still,

Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  0

Semantic error in 'string', line 34, column 4 to column 27:

Ill-typed arguments to '~' statement. No distribution 'bernoulli' was found with the correct signature

Oh! Sorry, I failed to notice this way back from your first post. You have:

data{
    …
    vector[N] y ;
}

Which declares y as a vector, which means y is a collection of reals, but if y is instead a collections of ones and zeroes (and thereby suitable for modelling as a bernoulli outcome), you need to have it as:

data{
    …
    int y[N] ; // old array syntax, new would be: array[N] int y;
}

Again, thanks a lot for your follow-up, now I’ve changed the Stan codes in the middle to be

data {
  int<lower=0> N;                     // num individuals
  int<lower=1> K;                     // num ind predictors
  int<lower=1> J;                     // num groups
  int<lower=1> L;                     // num group predictors
  array[N] int<lower=1, upper=J> jj;  // group for individual
  matrix[N, K] x;                     // individual predictors
  row_vector[L] u[J];           // group predictors
  // vector[N] y;                        // outcomes
  int<lower=0, upper=1> y[N];
}

parameters {
  corr_matrix[K] Omega;        // prior correlation
  vector<lower=0>[K] tau;      // prior scale
  matrix[L, K] gamma;          // group coeffs
  array[J] vector[K] beta;     // indiv coeffs by group
  real<lower=0> sigma;         // prediction error scale
}

model {
  tau ~ cauchy(0, 2.5);
  Omega ~ lkj_corr(2);
  to_vector(gamma) ~ normal(0, 5);
  {
    array[J] row_vector[K] u_gamma;
    for (j in 1:J) {
      u_gamma[j] = u[j] * gamma;
    }
    beta ~ multi_normal(u_gamma, quad_form_diag(Omega, tau));
  }
  for (n in 1:N) {
     real eta;
     real prob;
     eta <- x[n] * beta[jj[n]]; 
     prob <- 1/(1+exp(-1*eta));  
     y[n] ~ bernoulli(prob);
    // y[n] ~ bernoulli_logit(x[n] * beta[jj[n]]);
  }
}

Now the error becomes

Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  0

Syntax error in 'string', line 10, column 7 to column 8, parsing error:

Expected "generated quantities {" or end of file after end of model block.

It appears the problem is related to 1) how y is defined and enters the simulation and 2) matching signs. But I checked the bracket signs, and didn’t find problems. I am also wondering why y can’t be defined as

vector[N] y

since that’s how the example is constructed here at Stan manual webpage

I am also attaching my R codes and the data again
mydta.RData (16.3 KB)
stanDiscoursePost20220304.R (3.1 KB)
for help.

Jun

When I copied your code and used RStudio’s built-in syntax checker, which uses RStan’s version, this passed:

data {
  int<lower=0> N;                     // num individuals
  int<lower=1> K;                     // num ind predictors
  int<lower=1> J;                     // num groups
  int<lower=1> L;                     // num group predictors
  int<lower=1, upper=J> jj[N] ;  // group for individual
  matrix[N, K] x;                     // individual predictors
  row_vector[L] u[J];           // group predictors
  // vector[N] y;                        // outcomes
  int<lower=0, upper=1> y[N];
}

parameters {
  corr_matrix[K] Omega;        // prior correlation
  vector<lower=0>[K] tau;      // prior scale
  matrix[L, K] gamma;          // group coeffs
  vector[K] beta[J];     // indiv coeffs by group
  real<lower=0> sigma;         // prediction error scale
}

model {
  tau ~ cauchy(0, 2.5);
  Omega ~ lkj_corr(2);
  to_vector(gamma) ~ normal(0, 5);
  {
    row_vector[K] u_gamma[J];
    for (j in 1:J) {
      u_gamma[j] = u[j] * gamma;
    }
    beta ~ multi_normal(u_gamma, quad_form_diag(Omega, tau));
  }
  for (n in 1:N) {
     real eta;
     real prob;
     eta <- x[n] * beta[jj[n]];
     prob <- 1/(1+exp(-1*eta));
     y[n] ~ bernoulli(prob);
    // y[n] ~ bernoulli_logit(x[n] * beta[jj[n]]);
  }
}

Hi Mike,

Thanks a lot for your pointers and check. I used these codes (with everything else staying the same as those in my R file posted immediately above) to run the model, and still got the following message,

Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  0

Syntax error in 'string', line 19, column 12 to column 13, parsing error:

Expected "generated quantities {" or end of file after end of model block.

Any suggestion would be very much appreciated!

Can you see if the new version posted here helps? This looks like a recently fixed bug which required R to be restarted

Hi Brian,

Thanks for your pointer! I will upgrade my rstan package and try it again. Will update you what I find.

Jun

Hi Mike,

Thanks for your pointer, and I feel I am getting very close to having the codes work. But can you share with me how you use RStudio’s built-in syntax checker? I tried to find the syntax checker to turn it on, but was clueless even after I read several posts. Will you provide some pointer in that regard? Thanks!

Jun