Trouble using a model matrix of predictors for logistic regression

I like the idea of using a matrix of predictors to fit my bayesian models, it seems to be a very versatile technique, however I am having trouble getting this one to work. It is a simple bernoulli regression, estimating differences in the proportion of people in each of three groups (the told water group, the told decaf group, and the told caffeine group) who expect to feel better after their beverage. Here are the data

df3_postExp <- data.frame(id = 1:62, group = factor(c(rep("toldWater", 20),
                                                      rep("toldDecaf", 22),
                                                      rep("toldCaffeine", 20)), 
                                                    levels = c("toldWater", "toldDecaf", "toldCaffeine")),
                          expEffect = factor(c(rep("expNoEffect",16), rep("expFeelBetter",4),
                                               rep("expNoEffect",16), rep("expFeelBetter",6),
                                               rep("expNoEffect",4), rep("expFeelBetter",16)),
                                             levels = c("expNoEffect", "expFeelBetter")))

Now I specify a simple model with an intercept/reference level (the told water group) to whom the other two groups are compared.

ms <- "
    int nY;
    int nX;
    vector[nY] y;
    matrix[nY,nX] x;
    vector[nX] coef;
    coef ~ normal( 0,10 ) ;
    y ~ bernoulli_logit( x*coef ) ;
  generated quantities{
    int predY[nY];
    for (i in 1:nY) {
      predY[i] = bernoulli_logit_rng( (x*coef)[i] );

Now I create the matrix, suing the model.matrix() function in R to generate the model matrix.

#  create matrix of predictors
x <- model.matrix(~ group, data = df3_postExp)

# fit the model in rstan
post <- rstan::stan(model_code = ms,
                    data = list(nY = nrow(x),
                                nX = ncol(x),
                                y = as.numeric(df3_postExp$expEffect) - 1,
                                x = x),
                    warmup = 1e3,
                    iter = 3e3)

But the model does not run. Instead I get the error message

No matches for: 

  vector ~ bernoulli_logit(vector)

Available argument signatures for bernoulli_logit:

  int ~ bernoulli_logit(real)
  int ~ bernoulli_logit(real[ ])
  int ~ bernoulli_logit(vector)
  int ~ bernoulli_logit(row_vector)
  int[ ] ~ bernoulli_logit(real)
  int[ ] ~ bernoulli_logit(real[ ])
  int[ ] ~ bernoulli_logit(vector)
  int[ ] ~ bernoulli_logit(row_vector)

Real return type required for probability function.
 error in 'model28d45e3376bd_24b59d47e77c87f1fd544112c3560e88' at line 15, column 32
    13:   model{
    14:     coef ~ normal(0,10);
    15:     y ~ bernoulli_logit(x*coef);
    16:   }

Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model '24b59d47e77c87f1fd544112c3560e88' due to the above error.

Now this message seems to be telling me that my matrix of predictors is a vector, but I know it is a matrix.

Has anyone else encountered and solved this problem, and can give me some advice?

If I’ve done a stupid mistake I apologise, but it’s late at night and I can’t for the life of me what I’ve done wrong.

The error is saying that y has type vector, but the distribution is only defined for integer outcomes.

Try declaring instead:

int y[nY];

Also, have a look into the bernoulli_logit_glm distribution, it’s a more efficient to express these regressions

1 Like

Ah yes, of course stan would expect integers for a bernoulli. I adapted this from the script for a gaussian. Will look into bernoulli_logit_glm. Thanks @andrjohns.