Question about logistic regression

Why is it that the following works

data{
  int<lower=0> N;
  int<lower=0> K;
  matrix[N, K] X;
  //vector[N] Y;
  int<lower=0,upper=1> y[N];
}

parameters{
  real alpha;
  vector[K] beta;
}

model{
  y ~ bernoulli_logit(alpha + X * beta);
}

but if I use

  vector[N] Y;

instead of

  int<lower=0,upper=1> y[N];

I get the error

No matches for:
vector ~ bernoulli_logit(vector)

the values int<lower=0,upper=1> y[N] and vector[N] Y seem similar to me. I am not sure why one works and the other does not.

For reproducibility here is some simulated data in R:

x1 <- rnorm(n, 0, 1)
x2 <- rnorm(n, 0, 1)
e <- rnorm(n, 0, .5^2)

a <- 1
b1 <- 2
b2 <- 3

u <- a + b1*x1 + b2*x2 + e #linear combination
pr <- 1/(1 + exp(-u)) # inverse of the logit function

y <- rbinom(n,1,pr)

x <- as.data.frame(list(x1=x1,x2=x2)) %>% as.matrix()

# logistic regression
fit.logistic <- stan(file = "logistic.stan", 
                 data = list(N=length(y),X=x,Y=y,K=ncol(x)))

A vector in the Stan language contains real numbers and the various bernoulli functions are defined for 0 and 1 as integers.

It just appears to you so. int<lower=0,upper=1> y[N] is an integer array and vector[N] Y
is a vector.
Now you might ask, why not allow reals to represent integers. Well, first what to do if somebody
specify values unequal to 0 or 1?
Second you may run into some problems with rounding, see:

Because the vector representation is not defined. Pg. 509 of Stan manual. Its
only defined for integer(s).

real bernoulli_logit_lpmf(ints y | reals alpha)

1 Like

Is it not possible to define a vector of ints?

No, because vectors contain real numbers. It is possible to define a 1-dimensional int array that only contains 0 and 1, which has essentially the same layout in RAM but does not have linear algebra operations defined for it.

1 Like

For now, int values are 32 bit and real values 64 bit in Stan. But we hope to upgrade to 64-bit integers soon.