# Negative binomial shape and phi parameters

I am trying to fit a distributional model using the negative binomial family where I can set predictors for the overdispersion parameter phi. (this would correspond to stan’s alternative parameterization neg_binomial_2, described in section 13.2 of the function reference.)

Paul said that I can do this with the existing brms family negbinomial(), however I am still confused. negbinomial() has a “shape” parameter. Is that the same as “phi” in stan’s neg_binomial_2?

• Operating System: Mac OS 10.14.2
• brms Version: 2.4.3

Yes, shape is phi.

1 Like

Before I realized that the brms built-in negative binomial was parameterized the way I needed, I had started to create my own custom family. I found that in some circumstances the fit converged better with the custom family than in the built-in. Below is an example that shows the custom family sampling better than the built-in. I compared the stancode and it differs in a couple of places…that is shown below the R code.

The question is why the methods of fitting perform differently and if I am using the brms built-in negative binomial incorrectly. I hope not to debate whether my simulation, model, or priors make sense (unless that directly pertains to why the methods perform differently), sense they were chosen to illustrate the issue.

## Set up…

``````library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
library(brms)
library(tidyverse)
library(diffobj)
``````

Create custom family

``````neg_binomial_2 <- custom_family("neg_binomial_2",
dpars = c("mu", "phi"),
type = "int")
``````

Create a simulated data set

``````set.seed(23)
n <- 1000
df <- tibble(
count=rnbinom(n=n,
size=c(1,10), #different shape per treatment
mu=c(10,4)), #different mean per treatment
treatment=rep(c("A","B"), n/2),
group=as.character(rep(1:10,each=n/10))) #some grouping factor

df <- df %>%
group_by(group) %>%
mutate(count =  count + rnbinom(n=1, size=20, mu=1)) # add a grouping effect
# I am not actually going to use the grouping effect in the initial models
# but having the unaccounted for noise in the data helps to illustrate the differences between
# the built-in in and the custom family.  The issue can persist when the grouping effects
# are included in the model.

df %>% group_by(treatment) %>%
summarize(mean=mean(count), variance=var(count),
phi=mean(count)^2 / (var(count) - mean(count)))
``````

## Fit using built-in

``````system.time(fit1 <- brm(bf(count ~ treatment, shape ~ treatment),
family = negbinomial(),
prior=set_prior("student_t(3,0,10)", class="b"),
data=df))

summary(fit1)
``````

The above takes about 1400 seconds on my computer (4core Mac). The effective samples are all less than 10, and Rhat is >> 1.

## Fit using custom family

``````system.time(fit2 <- brm(bf(count ~ treatment, phi ~ treatment),
family = neg_binomial_2,
prior=set_prior("student_t(3,0,10)", class="b"),
data = df))

summary(fit2)
``````

The custom family fit takes about 200 seconds. The effective samples are >1000 for 3 of the 4 parameters and Rhat is 1.00 for those three and 1.04 for the last. So something is clearly different.

## Stancode differences

``````diffObj(stancode(fit1), stancode(fit2))
``````

Apart from the trivial (phi vs shape), the following are different.

### The built-in does not transform mu

• built-in
``````for (n in 1:N) {
shape[n] = exp(shape[n]);
}
``````
• custom
``````for (n in 1:N) {
phi[n] = exp(phi[n]);
mu[n] = exp(mu[n]);
}
``````

### the likelihood function, and the way it is called are different.

• built-in
``````  // likelihood including all constants
if (!prior_only) {
target += neg_binomial_2_log_lpmf(Y | mu, shape);
}
``````
• custom
``````  // likelihood including all constants
if (!prior_only) {
for (n in 1:N) {
target += neg_binomial_2_lpmf(Y[n] | mu[n], phi[n]);
}
}
``````

I assume that mu is not transformed in the built-in because log_lpms if called and that does the transformation internally (?)

So…why does the custom come closer to convergence? Am I calling the built-in incorrectly?

Thanks,

Julin

## Full stan code

### built-in

``````> stancode(fit1)
// generated with brms 2.4.3
functions {
}
data {
int<lower=1> N;  // total number of observations
int Y[N];  // response variable
int<lower=1> K;  // number of population-level effects
matrix[N, K] X;  // population-level design matrix
int<lower=1> K_shape;  // number of population-level effects
matrix[N, K_shape] X_shape;  // population-level design matrix
int prior_only;  // should the likelihood be ignored?
}
transformed data {
int Kc = K - 1;
matrix[N, K - 1] Xc;  // centered version of X
vector[K - 1] means_X;  // column means of X before centering
int Kc_shape = K_shape - 1;
matrix[N, K_shape - 1] Xc_shape;  // centered version of X_shape
vector[K_shape - 1] means_X_shape;  // column means of X_shape before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
for (i in 2:K_shape) {
means_X_shape[i - 1] = mean(X_shape[, i]);
Xc_shape[, i - 1] = X_shape[, i] - means_X_shape[i - 1];
}
}
parameters {
vector[Kc] b;  // population-level effects
real temp_Intercept;  // temporary intercept
vector[Kc_shape] b_shape;  // population-level effects
real temp_shape_Intercept;  // temporary intercept
}
transformed parameters {
}
model {
vector[N] mu = temp_Intercept + Xc * b;
vector[N] shape = temp_shape_Intercept + Xc_shape * b_shape;
for (n in 1:N) {
shape[n] = exp(shape[n]);
}
// priors including all constants
target += student_t_lpdf(b | 3,0,10);
target += student_t_lpdf(temp_Intercept | 3, 2, 10);
target += student_t_lpdf(temp_shape_Intercept | 3, 0, 10);
// likelihood including all constants
if (!prior_only) {
target += neg_binomial_2_log_lpmf(Y | mu, shape);
}
}
generated quantities {
// actual population-level intercept
real b_Intercept = temp_Intercept - dot_product(means_X, b);
// actual population-level intercept
real b_shape_Intercept = temp_shape_Intercept - dot_product(means_X_shape, b_shape);
}
``````

### custom

``````> stancode(fit2)
// generated with brms 2.4.3
functions {
}
data {
int<lower=1> N;  // total number of observations
int Y[N];  // response variable
int<lower=1> K;  // number of population-level effects
matrix[N, K] X;  // population-level design matrix
int<lower=1> K_phi;  // number of population-level effects
matrix[N, K_phi] X_phi;  // population-level design matrix
int prior_only;  // should the likelihood be ignored?
}
transformed data {
int Kc = K - 1;
matrix[N, K - 1] Xc;  // centered version of X
vector[K - 1] means_X;  // column means of X before centering
int Kc_phi = K_phi - 1;
matrix[N, K_phi - 1] Xc_phi;  // centered version of X_phi
vector[K_phi - 1] means_X_phi;  // column means of X_phi before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
for (i in 2:K_phi) {
means_X_phi[i - 1] = mean(X_phi[, i]);
Xc_phi[, i - 1] = X_phi[, i] - means_X_phi[i - 1];
}
}
parameters {
vector[Kc] b;  // population-level effects
real temp_Intercept;  // temporary intercept
vector[Kc_phi] b_phi;  // population-level effects
real temp_phi_Intercept;  // temporary intercept
}
transformed parameters {
}
model {
vector[N] mu = temp_Intercept + Xc * b;
vector[N] phi = temp_phi_Intercept + Xc_phi * b_phi;
for (n in 1:N) {
phi[n] = exp(phi[n]);
mu[n] = exp(mu[n]);
}
// priors including all constants
target += student_t_lpdf(b | 3,0,10);
target += student_t_lpdf(temp_Intercept | 3, 2, 10);
target += student_t_lpdf(temp_phi_Intercept | 3, 0, 10);
// likelihood including all constants
if (!prior_only) {
for (n in 1:N) {
target += neg_binomial_2_lpmf(Y[n] | mu[n], phi[n]);
}
}
}
generated quantities {
// actual population-level intercept
real b_Intercept = temp_Intercept - dot_product(means_X, b);
// actual population-level intercept
real b_phi_Intercept = temp_phi_Intercept - dot_product(means_X_phi, b_phi);
}
``````

## Session info

``````> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.14.2

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
 en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
 stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 bindrcpp_0.2.2     diffobj_0.2.2      forcats_0.3.0      stringr_1.3.1
 tibble_1.4.2       tidyverse_1.2.1    brms_2.4.3         Rcpp_0.12.18

loaded via a namespace (and not attached):
 httr_1.3.1           Brobdingnag_1.2-6    jsonlite_1.5         modelr_0.1.2
 gtools_3.8.1         threejs_0.3.1        shiny_1.1.0          assertthat_0.2.0
 stats4_3.5.1         cellranger_1.1.0     yaml_2.2.0           pillar_1.3.0
 backports_1.1.2      lattice_0.20-35      glue_1.3.0           digest_0.6.16
 promises_1.0.1       rvest_0.3.2          colorspace_1.3-2     htmltools_0.3.6
 httpuv_1.4.5         Matrix_1.2-14        plyr_1.8.4           dygraphs_1.1.1.6
 pkgconfig_2.0.2      broom_0.5.0          haven_1.1.2          xtable_1.8-2
 mvtnorm_1.0-8        scales_1.0.0         later_0.7.3          bayesplot_1.6.0
 DT_0.4               withr_2.1.2          shinyjs_1.0          lazyeval_0.2.1
 mime_0.5             nlme_3.1-137         xml2_1.2.0           xts_0.11-0
 colourpicker_1.0     rsconnect_0.8.8      tools_3.5.1          loo_2.0.0
 hms_0.4.2            matrixStats_0.54.0   munsell_0.5.0        compiler_3.5.1
 rlang_0.2.2.9000     grid_3.5.1           ggridges_0.5.0       rstudioapi_0.7
 htmlwidgets_1.2      crosstalk_1.0.0      igraph_1.2.2         miniUI_0.1.1.1
 base64enc_0.1-3      codetools_0.2-15     gtable_0.2.0         inline_0.3.15
 abind_1.4-5          markdown_0.8         reshape2_1.4.3       R6_2.2.2
 lubridate_1.7.4      gridExtra_2.3        rstantools_1.5.1     zoo_1.8-3
 knitr_1.20           bridgesampling_0.5-2 bindr_0.1.1          shinystan_2.5.0
 shinythemes_1.1.1    stringi_1.2.4        parallel_3.5.1       tidyselect_0.2.4
 coda_0.19-1
>
``````
1 Like

ps to make it easier to run the code, here it is all in one chunk

``````# set-up
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
library(brms)
library(tidyverse)
library(diffobj)

# custom family
neg_binomial_2 <- custom_family("neg_binomial_2",
dpars = c("mu", "phi"),
type = "int")

# create data set
set.seed(23)
n <- 1000
df <- tibble(
count=rnbinom(n=n,
size=c(1,10), #different shape per treatment
mu=c(10,4)), #different mean per treatment
treatment=rep(c("A","B"), n/2),
group=as.character(rep(1:10,each=n/10))) #some grouping factor

df <- df %>%
group_by(group) %>%
mutate(count =  count + rnbinom(n=1, size=20, mu=1)) # add a grouping effect

# check it

df %>% group_by(treatment) %>%
summarize(mean=mean(count), variance=var(count),
phi=mean(count)^2 / (var(count) - mean(count)))

# fit with built-in
system.time(fit1 <- brm(bf(count ~ treatment, shape ~ treatment),
family = negbinomial(),
prior=set_prior("student_t(3,0,10)", class="b"),
data=df))

summary(fit1)

# fit with custom
system.time(fit2 <- brm(bf(count ~ treatment, phi ~ treatment),
family = neg_binomial_2,
prior=set_prior("student_t(3,0,10)", class="b"),
data = df))

summary(fit2)

# stancode differences

diffObj(stancode(fit1), stancode(fit2))
``````

That’s interesting, I can confirm the custom family is faster in this case but only for some chains it appears.

I would have expected the built-in `neg_binomial_2_log_lpmf` to be more efficient, at least that’s what I usually observe with built-in family-link combinations. But perhaps this is also problem-specific. It appears as if some chains are getting stuck while others are sampling rather fast. @bgoodri I would love to hear your opinion about this if you have time.

The negative binomial is a weird distribution. But this does not make sense even for the negative binomial.

Given that my data set is one where the custom family is fitting better, can anyone provide advice on how to complete the custom family (define the log_lik_, predict_, and fitted_ functions?) The vignette example uses expose_functions() to gain access to user defined stan functions, but that isn’t relevant here (no user-defined functions). I am not sure how to have R access the relevant stan functions (e.g. neg_binomial_2_lpmf, neg_binomial_2_rng)

Have spent some more time looking at brms code. Am I correct that I can just make some modify the log_lik_negbinomial() function defined in brms/R/log_lik.R ? (and equivalent for predict and fitted)? I’ll give it a try.

``````log_lik_neg_binomial_2 <- function(i, draws, data = data.frame()) {
brms:::log_lik_negbinomial(i, draws, data)
}
``````

should do.

almost, but I think I need to change `phi` to `shape` in draws before passing it to `brms:::log_lik_negbinomial`

So

``````log_lik_neg_binomial_2 <- function(i, draws, data = data.frame()) {
draws\$f\$dpars <- "shape"
names(draws\$dpars) <- "shape"
brms:::log_lik_negbinomial(i, draws, data)
}
``````

Not exactly. It’s perhaps easiest to use

``````log_lik_neg_binomial_2 <- function(i, draws, data = data.frame()) {
draws\$dpars\$phi <- draws\$dpars\$shape
draws\$dpars\$shape <- NULL
brms:::log_lik_negbinomial(i, draws, data)
}
``````

Thanks, will do (although mine did seem to work and give equivalent loo() results to a model fit with the built-in)

Also, I think you meant

``````log_lik_neg_binomial_2 <- function(i, draws, data = data.frame()) {
draws\$dpars\$shape <- draws\$dpars\$phi
draws\$dpars\$phi <- NULL
brms:::log_lik_negbinomial(i, draws, data)
}
``````

Yes, sorry, that’s what I ment.