I am attempting to learn Bayesian logistic regression using rstan
. I am ok with the procedure for binomial logistic regression and how to calculate waic, but am having more trouble with the bernoulli equivalent.
As an example here is a dataset supplied with the rethinking
package, testing prosocial behaviour in chimpanzees. The model I am fitting is very simple, with a binary outcome variable indicating whether the chimpanzee pulled the left lever (variable pulled_left
; 0 for right lever, 1 for left) as a function of an intercept a
and a single predictor bp
which represents the increase in log-odds of choosing the left lever when it is the prosocial option (indicated by binary variable prosoc_left
, where 0 indictes the right lever is the prosocial option and 1 indicates the left is the prosocial option).
Here is the code to run the binomial logit model
library(rethinking)
library(loo)
data("chimpanzees")
d <- chimpanzees
# data
stanData <- list(N = nrow(d),
pulled_left = d$pulled_left,
prosoc_left = d$prosoc_left)
# model string
ms <- "
data {
int <lower=1> N;
int pulled_left[N];
int prosoc_left[N];
}
parameters{
real a;
real bp;
}
model{
vector[N] p;
bp ~ normal(0, 10);
a ~ normal(0, 10);
for (i in 1:N) {
p[i] = a + bp*prosoc_left[i];
}
pulled_left ~ binomial_logit(1, p);
}
generated quantities{
// outcome vectors
vector[N] p;
vector[N] log_lik;
// deviance
real dev;
dev = 0;
for (i in 1:N) {
p[i] = a + bp*prosoc_left[i];
}
dev = dev + (-2)*binomial_logit_lpmf(pulled_left | 1, p);
// log likelihood for waic
for (n in 1:N) {
log_lik[n] = binomial_logit_lpmf( pulled_left[n] | 1 , p[n] );
}
}
"
sMod <- stan(model_code = ms,
data = stanData,
warmup = 1e3,
iter = 2e3,
cores = 1,
chains = 1,
seed = 659)
This is the output, with mean and 89% HPDI for each parameter and the deviance.
# deviance and parameter estimates with an 89% HPDI
print(sMod, pars = c("a", "bp", "dev"), probs = c(0.055, 0.945))
###output
Inference for Stan model: d0a3dd9d0aa61e78bdc2aa6f9413d841.
1 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=1000.
mean se_mean sd 5.5% 94.5% n_eff Rhat
a 0.05 0.01 0.13 -0.15 0.25 305 1.00
bp 0.55 0.01 0.19 0.26 0.86 353 1.00
dev 678.53 0.10 2.08 676.61 682.79 399 1.01
Samples were drawn using NUTS(diag_e) at Thu Jan 23 09:01:28 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
And here is the code for calculating the waic for this model
log_lik <- loo::extract_log_lik(sMod, parameter_name = "log_lik")
loo::waic(log_lik)
###output
Computed from 1000 by 504 log-likelihood matrix
Estimate SE
elpd_waic -340.3 4.7
p_waic 2.0 0.0
waic 680.6 9.4
Now I can run this as a bernoulli model.
# model string for bernoulli logit regression
ms_bern <- "
data {
int <lower=1> N;
int pulled_left[N];
int prosoc_left[N];
}
parameters{
real a;
real bp;
}
model{
vector[N] p;
bp ~ normal(0, 10);
a ~ normal(0, 10);
for (i in 1:N) {
pulled_left[i] ~ bernoulli_logit(a + bp*prosoc_left[i]);
}
}
"
sMod_bern <- stan(model_code = ms_bern,
data = stanData,
warmup = 1e3,
iter = 2e3,
cores = 1,
chains = 1,
seed = 659)
# get deviance and parameter estimates with an 89% HPDI
print(sMod_bern, pars = c("a", "bp"), probs = c(0.055, 0.945))
###output
Inference for Stan model: f0b1bc884dd871b780d24998a8a1eeee.
1 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=1000.
mean se_mean sd 5.5% 94.5% n_eff Rhat
a 0.05 0.01 0.12 -0.13 0.25 391 1
bp 0.56 0.01 0.18 0.27 0.85 434 1
Samples were drawn using NUTS(diag_e) at Thu Jan 23 09:14:50 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
The mean and HPDI for the parameter estimates is very similar to the binomial logit model, which is heartening.
However, what I would like to do is obtain deviance and log-likelihood from this bernoulli logit model, similar to the binomial logit model.
But, when I add the following code to the model to calculate the log-likelihood…
generated quantities{
vector[N] log_lik;
// log likelihood for waic
for (n in 1:N) {
log_lik[n] = bernoulli_logit_lpmf( pulled_left[n] | a, prosoc_left[n] );
}
…I get the following error message
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for:
bernoulli_logit_lpmf(int, real, int)
Available argument signatures for bernoulli_logit_lpmf:
bernoulli_logit_lpmf(int, real)
bernoulli_logit_lpmf(int, real[ ])
bernoulli_logit_lpmf(int, vector)
bernoulli_logit_lpmf(int, row_vector)
bernoulli_logit_lpmf(int[ ], real)
bernoulli_logit_lpmf(int[ ], real[ ])
bernoulli_logit_lpmf(int[ ], vector)
bernoulli_logit_lpmf(int[ ], row_vector)
error in 'modela122d39bc53_e21074c1ee7391593c1d57619e5e61b2' at line 30, column 75
-------------------------------------------------
28: // log likelihood for waic
29: for (n in 1:N) {
30: log_lik[n] = bernoulli_logit_lpmf( pulled_left[n] | a, prosoc_left[n] );
^
31: }
-------------------------------------------------
Error in stanc(file = file, model_code = model_code, model_name = model_name, :
failed to parse Stan model 'e21074c1ee7391593c1d57619e5e61b2' due to the above error.
And I have even less of an idea how to get the deviance from this model.
Can anyone help me out?