library(tidyverse)
library(rethinking)
Beta Binomial from Statistical Rethinking Second Edition
I’m helping to teach a course using Richard McElreath’s wonderful book Statistical Rethinking. I was working through the code for Chapter 12, where it demonstrates a Beta-Binomial approach to handling over-dispersion. Section 12.1.1, model 12.1.
The model, as coded in the book, doesn’t converge. I can’t re-create the results from the text, even if I increase the number of samples and increase the maximum treedepth to 18. Even then, the sampling is extremely inefficient, effective sample sizes in the single or double digits after running for hours.
It’s a relatively simple model, with a very simple dataset, so this surprised me.
the model and code from the book
library(rethinking)
data(UCBadmit)
d <- UCBadmit
d$gid <- ifelse( d$applicant.gender=="male" , 1L , 2L )
dat <- list( A=d$admit , N=d$applications , gid=d$gid )
dat
## $A
## [1] 512 89 353 17 120 202 138 131 53 94 22 24
##
## $N
## [1] 825 108 560 25 325 593 417 375 191 393 373 341
##
## $gid
## [1] 1 2 1 2 1 2 1 2 1 2 1 2
m12.1 <- ulam(
alist(
A ~ dbetabinom( N , pbar , theta ),
logit(pbar) <- a[gid],
a[gid] ~ dnorm( 0 , 1.5 ),
transpars> theta <<- phi + 2.0,
phi ~ dexp(1)
), data=dat , chains=4,cores = 4 )
precis(m12.1,depth = 2)
## mean sd 5.5% 94.5% n_eff Rhat4
## a[1] -0.8880459 0.6090875 -1.8903317 0.07537078 2.914372 2.011188
## a[2] 0.1912690 1.2227676 -1.5853966 1.68761780 2.288172 4.039251
## phi 1.4617581 1.4749620 0.2004959 4.27982375 2.452842 3.863700
## theta 3.4617581 1.4749620 2.2004959 6.27982375 2.452842 3.863700
I did some digging, and I’ve managed to isolate the issue.
I’ve re-written the ulam()
model from the book in Stan Code.
Here’s the original model using the “~” syntax:
data{
int n;
int N[n];
int A[n];
int gid[n];
}
parameters{
vector[2] a;
real<lower=0> phi;
}
transformed parameters{
real theta;
theta = phi + 2;
}
model{
vector[n] pbar;
phi ~ exponential( 1 );
a ~ normal(0 , 1.5 );
for ( i in 1:n ) {
pbar[i] = a[gid[i]];
pbar[i] = inv_logit(pbar[i]);
}
A ~ beta_binomial( N , pbar*theta , (1-pbar)*theta );
}
And here’s exactly the same model, with the likelihood line replaced with the “target+=” syntax.
data{
int n;
int N[n];
int A[n];
int gid[n];
}
parameters{
vector[2] a;
real<lower=0> phi;
}
transformed parameters{
real theta;
theta = phi + 2;
}
model{
vector[n] pbar;
phi ~ exponential( 1 );
a ~ normal(0 , 1.5 );
for ( i in 1:n ) {
pbar[i] = a[gid[i]];
pbar[i] = inv_logit(pbar[i]);
}
target += beta_binomial_lpmf(A | N, pbar * theta, (1-pbar) * theta);
}
I’ve fit the two models, and the results are completely different.
dat$n = length(dat$gid) #adding an index length value = 12
m12.1_Orig_param <- stan(file = "BBinom_original_syntax.stan", data = dat,
iter = 2000,
control = list(adapt_delta = 0.8,
max_treedepth = 10))
m12.1_Alt_param <- stan(file = "BBinom_target_syntax.stan", data = dat,
iter = 2000,
control = list(adapt_delta = 0.8,
max_treedepth = 10))
If the likelihood line uses the ~BetaBinomial()
syntax, the sampling is terrible, just like the example from the book.
print(m12.1_Orig_param)
## Inference for Stan model: BBinom_original_syntax.
## 4 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=4000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## a[1] 0.93 0.73 1.49 -2.17 0.34 1.02 1.75 3.65 4 1.75
## a[2] -0.51 1.02 1.67 -3.85 -1.93 -0.13 0.91 1.70 3 2.71
## phi 0.78 0.18 0.88 0.04 0.24 0.53 0.93 3.77 23 1.17
## theta 2.78 0.18 0.88 2.04 2.24 2.53 2.93 5.77 23 1.17
## lp__ -2.91 0.33 1.09 -5.25 -3.70 -2.83 -2.04 -1.18 11 1.44
##
## Samples were drawn using NUTS(diag_e) at Mon Nov 30 11:08:53 2020.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
However, if I replace the likelhood line with the target+=
syntax, the sampling is excellent.
print(m12.1_Alt_param)
## Inference for Stan model: cc652ec908dae917b216c658c9e7063d.
## 4 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=4000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## a[1] -0.44 0.01 0.40 -1.26 -0.71 -0.43 -0.18 0.33 2643 1
## a[2] -0.31 0.01 0.41 -1.12 -0.58 -0.31 -0.05 0.49 2651 1
## phi 1.03 0.01 0.78 0.04 0.43 0.88 1.46 2.96 2840 1
## theta 3.03 0.01 0.78 2.04 2.43 2.88 3.46 4.96 2840 1
## lp__ -69.18 0.04 1.36 -72.82 -69.80 -68.83 -68.17 -67.61 1373 1
##
## Samples were drawn using NUTS(diag_e) at Mon Nov 30 11:08:55 2020.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
Is this a bug? I’m relatively new to Stan, so perhaps I’ve misunderstood, but I thought the “~” and “target+=” approaches were largely equivalent.
I’m working on a Windows 10 machine,
R version 4.0.3
rstan 2.21.2
I’ve tried re-installing rstan, no difference.