Newbie looking for help with creating ppc in generated quantities block with bernoulli_logit_glm_rng

I am writing the following rstan model:

data {
  int<lower=0> N; // number of observations
  int<lower=1> K; // number of population-level effects
  vector[N] red;
  vector[N] blue;
  vector[N] pid;
  array[N] int<lower=0, upper=1> Y;
}
transformed data {
    // interaction
    vector[N] inter_1 = red .* pid;
    vector[N] inter_2 = blue .* pid;
    matrix[N, K] x = [red', blue', pid', inter_1', inter_2']';
}
parameters {
  real Intercept;
  vector[K] beta;
}
model {
  Y ~ bernoulli_logit_glm(x, Intercept, beta);
}

This should be a standard logistic regression to model binary outcomes with two moderator terms. I am able to successfully run my model, however, I want to do some posterior predictive checks with the bayesplot package.

I am new to stan, so this is probably just something really dumb. But when I adjust my code to include a generated quantities block (see below), I run into some problems:

data {
  int<lower=0> N; // number of observations
  int<lower=1> K; // number of population-level effects
  vector[N] red;
  vector[N] blue;
  vector[N] pid;
  array[N] int<lower=0, upper=1> Y;
}
transformed data {
    // interaction
    vector[N] inter_1 = red .* pid;
    vector[N] inter_2 = blue .* pid;
    matrix[N, K] x = [red', blue', pid', inter_1', inter_2']';
}
parameters {
  real Intercept;
  vector[K] beta;
}
model {
  Y ~ bernoulli_logit_glm(x, Intercept, beta);
}
generated quantities {
    // for ppc
    vector[N] y_rep;
    y_rep = bernoulli_logit_glm_rng(x, Intercept, beta);
}

The error says:

Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  0

Semantic error in 'string', line 25, column 12 to column 55:

A returning function was expected but an undeclared identifier 'bernoulli_logit_glm_rng' was supplied.

To me, I must be specifying something incorrectly. Is it that Intercept is real rather than a vector? What would I change with my parameters block to make that happen? Am I just using the wrong function period?

According to the documentation, the output of bernouilli_logit_glm_rng needs to be an array of integers and the function is only available since Stan 2.29. These would be the first two things I would check.

So I have cmdstan updated to 2.29. However, I am getting an error that tells me that x (the first argument) should be type row_vector but it got type matrix. Which the documentation says is okay…

data {
  int<lower=0> N; // number of observations
  int<lower=1> K; // number of population-level effects
  vector[N] red;
  vector[N] blue;
  vector[N] pid;
  array[N] int<lower=0, upper=1> Y;
}
transformed data {
    // interaction
    vector[N] inter_1 = red .* pid;
    vector[N] inter_2 = blue .* pid;
    matrix[N, K] x = [red', blue', pid', inter_1', inter_2']';
}
parameters {
  real Intercept;
  vector[K] beta;
}
model {
  Y ~ bernoulli_logit_glm(x, Intercept, beta);
}
generated quantities {
  array[N] int<lower=0, upper = 1> y_rep;
  y_rep = bernoulli_logit_glm_rng(x, Intercept, beta);
}

What would x be in their argument? How would I construct that?

That is a bit unfortunate. This is going beyond my capabilities to help. @adam_haber, @rok_cesnovar , and @WardBrian were involved in exposing the function in Stan. Maybe they have an idea what is going on.

2 Likes

The problem is not x, the problem is Intercept, which is a scalar (real), but should be a vector.

If you change the last line to

y_rep = bernoulli_logit_glm_rng(x, rep_vector(Intercept, ...), beta);

it compiles fine. I didn’t think through the logic of whether ... is N or K, but its one of the two.

1 Like

Ahhhh, okay. That fixes it. Thank you @rok_cesnovar and @stijn!

2 Likes