Differences between neg_binomial_2 and neg_binomial_2_glm

Hi,
I’ve come across this large difference in sampling behavior when fitting:

g_size <- c(7,6,30)
r_nb <- c(1.8, 2, 2.2) - .2
p_nb <- c(.4, .4, .4) + 0.1

obs <- unlist(mapply(FUN = rnbinom, g_size, r_nb, p_nb))
g_id = unlist(mapply(rep, 1:3, g_size))
df <- data.frame(y = obs, id = g_id)
barplot(with(df, table(id, y)))

brm(bf(y ~ id, shape ~ 1, family = negbinomial),
    df, chains = 1)

and

brm(bf(y ~ id, shape ~ 0, family = negbinomial),
    df, chains = 1)

The former uses the neg_binomial_2 likelihood, the latter neg_binomial_2_glm, and the latter will converge without problems, the former will require large max_treedepth and adapt_delta and still return hundreds of divergent transitions. I’d like to eventually fit a scale/location model so I’d like to know how to get the neg_binomial_2 model to work. Any suggestions where to start looking for solutions?

1 Like

Unfortunately I’m not able to replicate the behaviour with the example you posted, see reprex below:

g_size <- c(7,6,30)
r_nb <- c(1.8, 2, 2.2) - .2
p_nb <- c(.4, .4, .4) + 0.1

obs <- unlist(mapply(FUN = rnbinom, g_size, r_nb, p_nb))
g_id = unlist(mapply(rep, 1:3, g_size))
df <- data.frame(y = obs, id = g_id)
barplot(with(df, table(id, y)))

library(brms)
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.15.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#> 
#> Attaching package: 'brms'
#> The following object is masked from 'package:stats':
#> 
#>     ar

former = brm(bf(y ~ id, shape ~ 1, family = negbinomial),
    df, chains = 1)
#> Compiling Stan program...
#> Start sampling
#> 
#> SAMPLING FOR MODEL '627031144159c16cf193fb304e147739' NOW (CHAIN 1).
#> Chain 1: 
#> Chain 1: Gradient evaluation took 3.6e-05 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.36 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1: 
#> Chain 1: 
#> Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 1: 
#> Chain 1:  Elapsed Time: 0.076959 seconds (Warm-up)
#> Chain 1:                0.090228 seconds (Sampling)
#> Chain 1:                0.167187 seconds (Total)
#> Chain 1:

latter = brm(bf(y ~ id, shape ~ 0, family = negbinomial),
    df, chains = 1)
#> Compiling Stan program...
#> Start sampling
#> 
#> SAMPLING FOR MODEL 'd123f231289692866b1610709a7e6d87' NOW (CHAIN 1).
#> Chain 1: 
#> Chain 1: Gradient evaluation took 2.5e-05 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.25 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1: 
#> Chain 1: 
#> Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 1: 
#> Chain 1:  Elapsed Time: 0.057027 seconds (Warm-up)
#> Chain 1:                0.053828 seconds (Sampling)
#> Chain 1:                0.110855 seconds (Total)
#> Chain 1:

summary(former)
#>  Family: negbinomial 
#>   Links: mu = log; shape = log 
#> Formula: y ~ id 
#>          shape ~ 1
#>    Data: df (Number of observations: 43) 
#> Samples: 1 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup samples = 1000
#> 
#> Population-Level Effects: 
#>                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept           0.61      0.59    -0.60     1.72 1.00      736      714
#> shape_Intercept     0.72      0.47    -0.09     1.77 1.00      640      418
#> id                  0.09      0.22    -0.33     0.53 1.00      847      719
#> 
#> Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
summary(latter)
#>  Family: negbinomial 
#>   Links: mu = log; shape = log 
#> Formula: y ~ id 
#>          shape ~ 0
#>    Data: df (Number of observations: 43) 
#> Samples: 1 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup samples = 1000
#> 
#> Population-Level Effects: 
#>           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept     0.64      0.64    -0.55     1.92 1.00     1059      727
#> id            0.08      0.24    -0.38     0.54 1.00     1102      517
#> 
#> Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).

Created on 2021-06-18 by the reprex package (v2.0.0)

Hi Andri,

I should have kept the seed I used to simulate. But I couldn’t reproduce the problem neither repeating the simulation.
But I managed with a different parametrization:

set.seed(21)
  g_size <- c(7,6,30)
  r_nb <- c(5.4, 8, 10.4) 
  p_nb <- c(.7, .7, .7) 
  
  obs <- unlist(mapply(FUN = rnbinom, g_size, r_nb, p_nb))
  g_id = unlist(mapply(rep, 1:3, g_size))
  df <- data.frame(y = obs, id = g_id)
  # barplot(with(df, table(id, y)))
  
  brm(bf(y ~ id, shape ~ 1, family = negbinomial),
      df, chains = 1)
  brm(bf(y ~ id, shape ~ 0, family = negbinomial),
      df, chains = 1)

It will give post warmup samples that either reach max_treedepth or are divergent transitions. Increasing max_treedepth will eventually lead to a very large number of divergent transitions and large rhats

1 Like

@andrjohns would you mind taking another look at this one? I’m stuck here otherwise.

This behaviour appears to be due to a bug in the older version of the negbinomial family in rstan 2.21. When running with the cmdstanr backend there are no sampling issues

3 Likes

So for anybody reading, options(brms.backend = "cmdstan") does the trick. Thanks @andrjohns