Hi,
Recently I have some observed binary data Y and I am interested in the posterior distribution check of the following simple model (this is a simplified version so there might be some mistakes in the Stan code):
data{
vector[N] y;
vector[N] x;
}
model {
p = invlogit(b1 + b2x);
y ~ Bernouli(p);
}
If I am interested in posterior predictive distribution check, my idea is in the generated quantities block adding the following:
generated quantities {
vector[N] y_predcited;
y_predicted ~ Bernouli(p);
}
Then compare with y and y_predicted. I am not sure whether the above idea is correct or not. Also I am not sure how Stan uses HMC in predictive posterior distribution. For example for p we have 1000 simulated draws from the posterior, but how about y_predict? Will we have only N y_predicted values or 1000*N y_predicted values? How could we account for the uncertainty in y_predicted?
Thx!
I edited your model some more. Not quite sure if this runs and works but it should be close:
data {
int y[N]; // The data for a bernoulli must be an integer
// vectors are only for doubles
vector[N] x;
}
parameters {
real b1;
real b2;
}
model {
b1 ~ normal(0, 1); // should always put priors on things
b2 ~ normal(0, 1); // improper priors can lead to unexpected
// nastiness
y ~ bernoulli_logit(b1 + b2 * x); // bernoulli_logit has
// the inverse logit built in and the numerics are
// more stable
}
generated quantities {
int y_predicted[N];
for(n in 1:N) {
y_predicted[n] = bernoulli_logit_rng(b1 + b2 * x[n]);
}
}
There’s info on logistic regressions here: 1.5 Logistic and Probit Regression | Stan User’s Guide
For the posterior predictive, generated quantities is evaluated once for every posterior draw you generate.
So if you generate 4000 posterior draws (the default 1000 draws with 4 chains), then you will get 4000 draws of the length N array y_predicted
(so 4000xN integers).
More info on posterior prediction here: 24 Posterior Predictive Sampling | Stan User’s Guide
The randomness there comes from two places, the uncertainty in b1
and b2
and the randomness in generating a bernoulli
random number.
That make sense?
3 Likes
That’s very helpful and thanks!