Trouble using a model matrix of predictors for logistic regression

llewmills · February 9, 2021, 2:35pm

I like the idea of using a matrix of predictors to fit my bayesian models, it seems to be a very versatile technique, however I am having trouble getting this one to work. It is a simple bernoulli regression, estimating differences in the proportion of people in each of three groups (the told water group, the told decaf group, and the told caffeine group) who expect to feel better after their beverage. Here are the data

df3_postExp <- data.frame(id = 1:62, group = factor(c(rep("toldWater", 20),
                                                      rep("toldDecaf", 22),
                                                      rep("toldCaffeine", 20)), 
                                                    levels = c("toldWater", "toldDecaf", "toldCaffeine")),
                          expEffect = factor(c(rep("expNoEffect",16), rep("expFeelBetter",4),
                                               rep("expNoEffect",16), rep("expFeelBetter",6),
                                               rep("expNoEffect",4), rep("expFeelBetter",16)),
                                             levels = c("expNoEffect", "expFeelBetter")))

Now I specify a simple model with an intercept/reference level (the told water group) to whom the other two groups are compared.

ms <- "
  data{
    int nY;
    int nX;
    vector[nY] y;
    matrix[nY,nX] x;
  }
  
  parameters{
    vector[nX] coef;
  }
  
  model{
    coef ~ normal( 0,10 ) ;
    y ~ bernoulli_logit( x*coef ) ;
  }
  
  generated quantities{
    int predY[nY];
    for (i in 1:nY) {
      predY[i] = bernoulli_logit_rng( (x*coef)[i] );
    }
  }
"

Now I create the matrix, suing the model.matrix() function in R to generate the model matrix.

#  create matrix of predictors
x <- model.matrix(~ group, data = df3_postExp)

# fit the model in rstan
post <- rstan::stan(model_code = ms,
                    data = list(nY = nrow(x),
                                nX = ncol(x),
                                y = as.numeric(df3_postExp$expEffect) - 1,
                                x = x),
                    warmup = 1e3,
                    iter = 3e3)

But the model does not run. Instead I get the error message

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for: 

  vector ~ bernoulli_logit(vector)

Available argument signatures for bernoulli_logit:

  int ~ bernoulli_logit(real)
  int ~ bernoulli_logit(real[ ])
  int ~ bernoulli_logit(vector)
  int ~ bernoulli_logit(row_vector)
  int[ ] ~ bernoulli_logit(real)
  int[ ] ~ bernoulli_logit(real[ ])
  int[ ] ~ bernoulli_logit(vector)
  int[ ] ~ bernoulli_logit(row_vector)

Real return type required for probability function.
 error in 'model28d45e3376bd_24b59d47e77c87f1fd544112c3560e88' at line 15, column 32
  -------------------------------------------------
    13:   model{
    14:     coef ~ normal(0,10);
    15:     y ~ bernoulli_logit(x*coef);
                                       ^
    16:   }
  -------------------------------------------------

Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model '24b59d47e77c87f1fd544112c3560e88' due to the above error.

Now this message seems to be telling me that my matrix of predictors is a vector, but I know it is a matrix.

Has anyone else encountered and solved this problem, and can give me some advice?

If I’ve done a stupid mistake I apologise, but it’s late at night and I can’t for the life of me what I’ve done wrong.

andrjohns · February 9, 2021, 5:19pm

The error is saying that y has type vector, but the distribution is only defined for integer outcomes.

Try declaring instead:

int y[nY];

Also, have a look into the bernoulli_logit_glm distribution, it’s a more efficient to express these regressions

llewmills · February 9, 2021, 9:02pm

Ah yes, of course stan would expect integers for a bernoulli. I adapted this from the script for a gaussian. Will look into bernoulli_logit_glm. Thanks @andrjohns.

Topic		Replies	Views
Ill-typed arguments to '~' statement. No distribution 'bernoulli_logit' was found with the correct signature Modeling rstan	7	995	April 8, 2023
Bernoulli_logit with different number of predictors Modeling rstan	3	985	February 5, 2022
Rstan: logistic regression Modeling	6	586	October 18, 2018
Problem with modelling Logistic Regression in Rstan Modeling specification	4	510	March 9, 2021
Getting an error when writing a Stan program for use with the loo package Modeling rstan , fitting-issues , specification , loo	3	403	November 3, 2020

Trouble using a model matrix of predictors for logistic regression

Related topics