Very different sampling with ~beta_binomial than with += beta_binomial_lpmf

library(tidyverse)
library(rethinking)

Beta Binomial from Statistical Rethinking Second Edition

I’m helping to teach a course using Richard McElreath’s wonderful book Statistical Rethinking. I was working through the code for Chapter 12, where it demonstrates a Beta-Binomial approach to handling over-dispersion. Section 12.1.1, model 12.1.

The model, as coded in the book, doesn’t converge. I can’t re-create the results from the text, even if I increase the number of samples and increase the maximum treedepth to 18. Even then, the sampling is extremely inefficient, effective sample sizes in the single or double digits after running for hours.

It’s a relatively simple model, with a very simple dataset, so this surprised me.

the model and code from the book

library(rethinking)
data(UCBadmit)
d <- UCBadmit
d$gid <- ifelse( d$applicant.gender=="male" , 1L , 2L )
dat <- list( A=d$admit , N=d$applications , gid=d$gid )
dat
## $A
##  [1] 512  89 353  17 120 202 138 131  53  94  22  24
## 
## $N
##  [1] 825 108 560  25 325 593 417 375 191 393 373 341
## 
## $gid
##  [1] 1 2 1 2 1 2 1 2 1 2 1 2


m12.1 <- ulam(
  alist(
    A ~ dbetabinom( N , pbar , theta ),
    logit(pbar) <- a[gid],
 a[gid] ~ dnorm( 0 , 1.5 ),
    transpars> theta <<- phi + 2.0,
    phi ~ dexp(1)
  ), data=dat , chains=4,cores = 4 )


precis(m12.1,depth = 2)
##             mean        sd       5.5%      94.5%    n_eff    Rhat4
## a[1]  -0.8880459 0.6090875 -1.8903317 0.07537078 2.914372 2.011188
## a[2]   0.1912690 1.2227676 -1.5853966 1.68761780 2.288172 4.039251
## phi    1.4617581 1.4749620  0.2004959 4.27982375 2.452842 3.863700
## theta  3.4617581 1.4749620  2.2004959 6.27982375 2.452842 3.863700

I did some digging, and I’ve managed to isolate the issue.
I’ve re-written the ulam() model from the book in Stan Code.

Here’s the original model using the “~” syntax:

data{
    int n;
    int N[n];
    int A[n];
    int gid[n];
}
parameters{
    vector[2] a;
    real<lower=0> phi;
}
transformed parameters{
    real theta;
    theta = phi + 2;
}
model{
    vector[n] pbar;
    phi ~ exponential( 1 );
    a ~ normal(0 , 1.5 );
    for ( i in 1:n ) {
        pbar[i] = a[gid[i]];
        pbar[i] = inv_logit(pbar[i]);
    }
    A ~ beta_binomial( N , pbar*theta , (1-pbar)*theta );
}

And here’s exactly the same model, with the likelihood line replaced with the “target+=” syntax.

data{
    int n;
    int N[n];
    int A[n];
    int gid[n];
}
parameters{
    vector[2] a;
    real<lower=0> phi;
}
transformed parameters{
    real theta;
    theta = phi + 2;
}
model{
    vector[n] pbar;
    phi ~ exponential( 1 );
    a ~ normal(0 , 1.5 );
    for ( i in 1:n ) {
        pbar[i] = a[gid[i]];
        pbar[i] = inv_logit(pbar[i]);
    }
    target += beta_binomial_lpmf(A | N, pbar * theta, (1-pbar) * theta);
    
}


I’ve fit the two models, and the results are completely different.


dat$n = length(dat$gid) #adding an index length value = 12

m12.1_Orig_param <- stan(file = "BBinom_original_syntax.stan", data = dat, 
              iter = 2000, 
              control = list(adapt_delta = 0.8,
                             max_treedepth = 10))



m12.1_Alt_param <- stan(file = "BBinom_target_syntax.stan", data = dat, 
              iter = 2000, 
              control = list(adapt_delta = 0.8,
                             max_treedepth = 10))




If the likelihood line uses the ~BetaBinomial() syntax, the sampling is terrible, just like the example from the book.

print(m12.1_Orig_param)
## Inference for Stan model: BBinom_original_syntax.
## 4 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=4000.
## 
##        mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
## a[1]   0.93    0.73 1.49 -2.17  0.34  1.02  1.75  3.65     4 1.75
## a[2]  -0.51    1.02 1.67 -3.85 -1.93 -0.13  0.91  1.70     3 2.71
## phi    0.78    0.18 0.88  0.04  0.24  0.53  0.93  3.77    23 1.17
## theta  2.78    0.18 0.88  2.04  2.24  2.53  2.93  5.77    23 1.17
## lp__  -2.91    0.33 1.09 -5.25 -3.70 -2.83 -2.04 -1.18    11 1.44
## 
## Samples were drawn using NUTS(diag_e) at Mon Nov 30 11:08:53 2020.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

However, if I replace the likelhood line with the target+= syntax, the sampling is excellent.

print(m12.1_Alt_param)
## Inference for Stan model: cc652ec908dae917b216c658c9e7063d.
## 4 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=4000.
## 
##         mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
## a[1]   -0.44    0.01 0.40  -1.26  -0.71  -0.43  -0.18   0.33  2643    1
## a[2]   -0.31    0.01 0.41  -1.12  -0.58  -0.31  -0.05   0.49  2651    1
## phi     1.03    0.01 0.78   0.04   0.43   0.88   1.46   2.96  2840    1
## theta   3.03    0.01 0.78   2.04   2.43   2.88   3.46   4.96  2840    1
## lp__  -69.18    0.04 1.36 -72.82 -69.80 -68.83 -68.17 -67.61  1373    1
## 
## Samples were drawn using NUTS(diag_e) at Mon Nov 30 11:08:55 2020.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).


Is this a bug? I’m relatively new to Stan, so perhaps I’ve misunderstood, but I thought the “~” and “target+=” approaches were largely equivalent.

I’m working on a Windows 10 machine,
R version 4.0.3
rstan 2.21.2

I’ve tried re-installing rstan, no difference.

1 Like

Wow yeah I think this is a bug. You definitely didn’t misunderstand. These two Stan programs should give the same inferences, the only difference is that the version with ~ avoids computing unnecessary normalizing constants.

The good news is that this seems to be fixed in newer versions of Stan. I can reproduce this with RStan but if I use the new CmdStanR interface instead of RStan then both versions of the model sample fine. That suggests to me that this was an issue in earlier versions of Stan because unfortunately the version of RStan on CRAN is still stuck on Stan v2.21 whereas CmdStanR can use Stan v2.25.

@bbbales2 @rok_cesnovar @stevebronder or anyone else, have you seen this before? Is this a bug that people were aware of? I definitely didn’t know about this.

Here’s the code I used to reproduce the problem in rstan with 2.21 and demonstrate that it’s not a problem with cmdstanr and 2.25:

library(tidyverse)
library(rethinking)
library(cmdstanr)
library(rstan)

file1 <- write_stan_file("
data{
    int n;
    int N[n];
    int A[n];
    int gid[n];
}
parameters{
    vector[2] a;
    real<lower=0> phi;
}
transformed parameters{
    real theta;
    theta = phi + 2;
}
model{
    vector[n] pbar;
    phi ~ exponential( 1 );
    a ~ normal(0 , 1.5 );
    for ( i in 1:n ) {
        pbar[i] = a[gid[i]];
        pbar[i] = inv_logit(pbar[i]);
    }
    A ~ beta_binomial( N , pbar*theta , (1-pbar)*theta );
}
")

file2 <- write_stan_file("
data{
    int n;
    int N[n];
    int A[n];
    int gid[n];
}
parameters{
    vector[2] a;
    real<lower=0> phi;
}
transformed parameters{
    real theta;
    theta = phi + 2;
}
model{
    vector[n] pbar;
    phi ~ exponential( 1 );
    a ~ normal(0 , 1.5 );
    for ( i in 1:n ) {
        pbar[i] = a[gid[i]];
        pbar[i] = inv_logit(pbar[i]);
    }
    target += beta_binomial_lpmf(A | N, pbar * theta, (1-pbar) * theta);
    
}
"
)

data(UCBadmit)
d <- UCBadmit
d$gid <- ifelse( d$applicant.gender=="male" , 1L , 2L )
dat <- list(A=d$admit , N=d$applications , gid=d$gid )
dat$n <- length(dat$gid) 

# reproduce the problem using rstan 
fit_rstan1 <- stan(file1, data = dat) # runs very slow, really bad rhat and ess
print(fit_rstan1)
fit_rstan2 <- stan(file2, data = dat) # runs fast, fine rhat and ess
print(fit_rstan2)

# no problem with cmdstan (both versions sample fine)
mod1 <- cmdstan_model(file1)
fit_cmdstan1 <- mod1$sample(data = dat)
print(fit_cmdstan1)

mod2 <- cmdstan_model(file2)
fit_cmdstan2 <- mod1$sample(data = dat)
print(fit_cmdstan2)

Yes this is know bug. See Rstan works but cmdstan does not for a beta binomial model

This was fixed for the 2.22 version of Stan. As far as I understand it was introduced somewhere between 2.19 and 2.21.

4 Likes

Thanks Rok!

2 Likes

@AdamCSmithCWS I think the latest versions of rethinking support using CmdStanR as the backend instead of RStan so that might be a good option for getting around this bug if you want to use this model with the rethinking package.

2 Likes

Brilliant! Thanks @jonah. I’ve been looking for an excuse to dive into CmdStanR. So here goes!

2 Likes