Array function coding error for multlevel logit model with multivariate priors

junxu2018 · February 25, 2022, 7:11pm

Dear all Stanians,

Hope this message finds you all well. I am writing to ask a question about using array in Stan code blocks. I am trying to estimate a two-level logit model with intercept and slopes as outcomes. Thus I need to have a multivariate prior for the random components at level-2. But I am running into problems with lines beginning with the array function. I pretty much followed the codes at the following Stan project site for manuals:

And my Stan codes are as follow with slight revision (logit instead of OLS) to the codes in the Stan manual. It appears that it doesn’t like this part (Line 5 in the data block): array[N] int<lower=1, upper=J> jj;

Somehow it doesn’t recognize array to begin this line. I also tried to turn the jj part to vector. It runs through this line, but when it hits any line that begins with array, the program always gives me error. I tried turning some of the lines beginning with array (e.g., “array[J] row_vector[L] u;”) to matrix, but it still didn’t work. Any pointer will be great appreciated.

Jun Xu, PhD
Professor
Department of Sociology
Data Analytics Program
Ball State University
Muncie, IN 47306

data {
  int<lower=0> N;                     // num individuals
  int<lower=1> K;                     // num ind predictors
  int<lower=1> J;                     // num groups
  int<lower=1> L;                     // num group predictors
  array[N] int<lower=1, upper=J> jj;  // group for individual
  matrix[N, K] x;                     // individual predictors
  array[J] row_vector[L] u;           // group predictors
  vector[N] y;                        // outcomes
}

parameters {
  corr_matrix[K] Omega;        // prior correlation
  vector<lower=0>[K] tau;      // prior scale
  matrix[L, K] gamma;          // group coeffs
  array[J] vector[K] beta;     // indiv coeffs by group
  real<lower=0> sigma;         // prediction error scale
}

model {
  tau ~ cauchy(0, 2.5);
  Omega ~ lkj_corr(2);
  to_vector(gamma) ~ normal(0, 5);
  {
    array[J] row_vector[K] u_gamma;
    for (j in 1:J) {
      u_gamma[j] = u[j] * gamma;
    }
    beta ~ multi_normal(u_gamma, quad_form_diag(Omega, tau));
  }
  for (n in 1:N) {
    y[n] ~ bernoulli(inv_logit(x[n] * beta[jj[n]]));
  }
}

and the error message is

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
error in ‘model2a4410413ab4_60f1e9e0a72498a2488a08151e849344’ at line 7, column 2

 5:   int<lower=1> J;                     // num groups
 6:   int<lower=1> L;                     // num group predictors
 7:   array[N] int<lower=1, upper=J> jj;  // group for individual
     ^
 8:   matrix[N, K] x;                     // individual predictors

PARSER EXPECTED: <one of the following:
a variable declaration, beginning with type,
(int, real, vector, row_vector, matrix, unit_vector,
simplex, ordered, positive_ordered,
corr_matrix, cov_matrix,
cholesky_corr, cholesky_cov
or ‘}’ to close variable declarations>
Error in stanc(file = file, model_code = model_code, model_name = model_name, :
failed to parse Stan model ‘60f1e9e0a72498a2488a08151e849344’ due to the above error.

mike-lawrence · February 25, 2022, 7:40pm

You need to update your RStan install; array is a syntax that is newer but should be supported by the latest RStan.

junxu2018 · February 25, 2022, 7:51pm

Thanks a lot for your suggestion. I just updated rstan, and it gives me the same error.

Jun

yizhang · February 25, 2022, 8:34pm

Does rstan even support array? CRAN version is still at 2.21 which is before array was introduced.

yizhang · February 25, 2022, 8:38pm

If none works out, you can fall back to the old syntax.

mike-lawrence · February 25, 2022, 10:23pm

Oops, sorry, yes Rstan on CRAN isn’t up to date with the new array syntax. Sorry, I had things confused with cmdstanr

WardBrian · February 26, 2022, 2:37pm

There are ways to use newer versions of RStan, see Rstan Versioning - #2 by rok_cesnovar

junxu2018 · February 26, 2022, 4:13pm

Thank you all for your pointers. I think I would back off from the new coding style. I have two questions. The first is why we need array since we have matrix. Is array designed for easy looping? Any pointer to the difference between the two and when to use arrays in Stan. Second, a specific question,

So here is what I did: I changed all new arrays to old ones, including

change

array[N] int<lower=1, upper=J> jj;

to

int<lower=1, upper=J> jj[N];

change

array[j] vector[K] beta;

to

vector[K] beta[J];

change

array[J] row_vector[K] u_gamma;

to

row_vector[K] u_gamma[J];

Now, I am getting this error message, and I believe this is related to the array function

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for: 

  real ~ bernoulli_logit(real)

Available argument signatures for bernoulli_logit:

  int ~ bernoulli_logit(real)
  int ~ bernoulli_logit(real[ ])
  int ~ bernoulli_logit(vector)
  int ~ bernoulli_logit(row_vector)
  int[ ] ~ bernoulli_logit(real)
  int[ ] ~ bernoulli_logit(real[ ])
  int[ ] ~ bernoulli_logit(vector)
  int[ ] ~ bernoulli_logit(row_vector)

Real return type required for probability function.
 error in 'model148022256544_6b2410f576b17d23cb28601b40359596' at line 33, column 46
  -------------------------------------------------
    31:   }
    32:   for (n in 1:N) {
    33:    y[n] ~ bernoulli_logit(x[n] * beta[jj[n]]);
                                                     ^
    34:   }
  -------------------------------------------------

Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model '6b2410f576b17d23cb28601b40359596' due to the above error.

junxu2018 · March 4, 2022, 4:32pm

Now I’ve uploaded my data, R file, and error message. Would very much appreciate any pointer. Note that I also updated rstanarm to its most current version. What I am trying to estimate is a two-level logit regression. The binary response variable is heart, and individual level predictors are age10, male, rBlack, lthischl, highschl, and group/county level predictors include ctymdinK (county median household income), and ctygini (gini coefficient), and level-2 identifier is ctyfips.

And the error now becomes:

Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  0

Semantic error in 'string', line 30, column 3 to column 46:

Ill-typed arguments to '~' statement. No distribution 'bernoulli_logit' was found with the correct signature.

mydta.RData (16.3 KB)
stanDiscoursePost20220304.R (2.9 KB)

mike-lawrence · March 6, 2022, 7:42pm

I think bernoulli_logit is another feature that’s newer than your RStan version supports. You’ll have to use inv_logit()manually instead.

junxu2018 · March 10, 2022, 12:17pm

Hi Mike,

Thanks a lot for your pointer. First, I updated rstanarm, and then I also tried to use old syntax like the following

  for (n in 1:N) {
    real eta;
    real prob;
    eta <- x[n] * beta[jj[n]]; 
    prob <- 1/(1+exp(-1*eta));  
    y[n] ~ bernoulli(prob);
 }

It still doesn’t work.

Jun

mike-lawrence · March 10, 2022, 2:07pm

What’s the error msg now?

junxu2018 · March 10, 2022, 2:35pm

Thanks a lot for your follow-up! The error message is still,

Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  0

Semantic error in 'string', line 34, column 4 to column 27:

Ill-typed arguments to '~' statement. No distribution 'bernoulli' was found with the correct signature

mike-lawrence · March 10, 2022, 3:27pm

Oh! Sorry, I failed to notice this way back from your first post. You have:

data{
    …
    vector[N] y ;
}

Which declares y as a vector, which means y is a collection of reals, but if y is instead a collections of ones and zeroes (and thereby suitable for modelling as a bernoulli outcome), you need to have it as:

data{
    …
    int y[N] ; // old array syntax, new would be: array[N] int y;
}

junxu2018 · March 11, 2022, 1:40pm

Again, thanks a lot for your follow-up, now I’ve changed the Stan codes in the middle to be

data {
  int<lower=0> N;                     // num individuals
  int<lower=1> K;                     // num ind predictors
  int<lower=1> J;                     // num groups
  int<lower=1> L;                     // num group predictors
  array[N] int<lower=1, upper=J> jj;  // group for individual
  matrix[N, K] x;                     // individual predictors
  row_vector[L] u[J];           // group predictors
  // vector[N] y;                        // outcomes
  int<lower=0, upper=1> y[N];
}

parameters {
  corr_matrix[K] Omega;        // prior correlation
  vector<lower=0>[K] tau;      // prior scale
  matrix[L, K] gamma;          // group coeffs
  array[J] vector[K] beta;     // indiv coeffs by group
  real<lower=0> sigma;         // prediction error scale
}

model {
  tau ~ cauchy(0, 2.5);
  Omega ~ lkj_corr(2);
  to_vector(gamma) ~ normal(0, 5);
  {
    array[J] row_vector[K] u_gamma;
    for (j in 1:J) {
      u_gamma[j] = u[j] * gamma;
    }
    beta ~ multi_normal(u_gamma, quad_form_diag(Omega, tau));
  }
  for (n in 1:N) {
     real eta;
     real prob;
     eta <- x[n] * beta[jj[n]]; 
     prob <- 1/(1+exp(-1*eta));  
     y[n] ~ bernoulli(prob);
    // y[n] ~ bernoulli_logit(x[n] * beta[jj[n]]);
  }
}

Now the error becomes

Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  0

Syntax error in 'string', line 10, column 7 to column 8, parsing error:

Expected "generated quantities {" or end of file after end of model block.

It appears the problem is related to 1) how y is defined and enters the simulation and 2) matching signs. But I checked the bracket signs, and didn’t find problems. I am also wondering why y can’t be defined as

vector[N] y

since that’s how the example is constructed here at Stan manual webpage

I am also attaching my R codes and the data again
mydta.RData (16.3 KB)
stanDiscoursePost20220304.R (3.1 KB)
for help.

Jun

mike-lawrence · March 11, 2022, 2:42pm

When I copied your code and used RStudio’s built-in syntax checker, which uses RStan’s version, this passed:

data {
  int<lower=0> N;                     // num individuals
  int<lower=1> K;                     // num ind predictors
  int<lower=1> J;                     // num groups
  int<lower=1> L;                     // num group predictors
  int<lower=1, upper=J> jj[N] ;  // group for individual
  matrix[N, K] x;                     // individual predictors
  row_vector[L] u[J];           // group predictors
  // vector[N] y;                        // outcomes
  int<lower=0, upper=1> y[N];
}

parameters {
  corr_matrix[K] Omega;        // prior correlation
  vector<lower=0>[K] tau;      // prior scale
  matrix[L, K] gamma;          // group coeffs
  vector[K] beta[J];     // indiv coeffs by group
  real<lower=0> sigma;         // prediction error scale
}

model {
  tau ~ cauchy(0, 2.5);
  Omega ~ lkj_corr(2);
  to_vector(gamma) ~ normal(0, 5);
  {
    row_vector[K] u_gamma[J];
    for (j in 1:J) {
      u_gamma[j] = u[j] * gamma;
    }
    beta ~ multi_normal(u_gamma, quad_form_diag(Omega, tau));
  }
  for (n in 1:N) {
     real eta;
     real prob;
     eta <- x[n] * beta[jj[n]];
     prob <- 1/(1+exp(-1*eta));
     y[n] ~ bernoulli(prob);
    // y[n] ~ bernoulli_logit(x[n] * beta[jj[n]]);
  }
}

junxu2018 · March 14, 2022, 12:58pm

Hi Mike,

Thanks a lot for your pointers and check. I used these codes (with everything else staying the same as those in my R file posted immediately above) to run the model, and still got the following message,

Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  0

Syntax error in 'string', line 19, column 12 to column 13, parsing error:

Expected "generated quantities {" or end of file after end of model block.

Any suggestion would be very much appreciated!

WardBrian · March 14, 2022, 2:24pm

Can you see if the new version posted here helps? This looks like a recently fixed bug which required R to be restarted

junxu2018 · March 15, 2022, 11:12pm

Hi Brian,

Thanks for your pointer! I will upgrade my rstan package and try it again. Will update you what I find.

Jun

junxu2018 · March 21, 2022, 12:36am

Hi Mike,

Thanks for your pointer, and I feel I am getting very close to having the codes work. But can you share with me how you use RStudio’s built-in syntax checker? I tried to find the syntax checker to turn it on, but was clueless even after I read several posts. Will you provide some pointer in that regard? Thanks!

Jun

Topic		Replies	Views
Syntax error using multi logit regression example Modeling	6	708	January 31, 2019
Hierarchical multi-logit regression Modeling	25	5557	June 5, 2018
Conditional logit function creation Modeling	4	510	July 27, 2023
Error specifying multivariate normal prior Modeling	1	1466	September 15, 2017
What's wrong with my stan code for estimating parameters with multivariate distribution Modeling	2	418	July 31, 2019

Array function coding error for multlevel logit model with multivariate priors

SYNTAX ERROR, MESSAGE(S) FROM PARSER: error in ‘model2a4410413ab4_60f1e9e0a72498a2488a08151e849344’ at line 7, column 2

Related topics

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
error in ‘model2a4410413ab4_60f1e9e0a72498a2488a08151e849344’ at line 7, column 2