Implementing quasi-poisson error model

I’m interested in testing various models of error for some discrete count data. While i can implement the poisson and negbinomial in brms, I don’t see an easy way to fit an overdispersed poisson (i.e. a quasipoisson) using brms.

My understanding is that x ~ qpoisson(mu, theta) then E(x) = mu and var(x) = theta mu, but if x ~ negbinomial(mu, shape), then E(x) = mu and var(x) =mu + mu^2/shape . Thus, the way to parameterize x ~ qpoisson is with x ~ negbinomial(mu, shape = mu/(theta - 1)), but now theta is the parameter of interest, not shape.

Is there an easy way of implementing this without having to define my own family? If not, is there a reason why this family isn’t included in brms?

1 Like

This is because the quasi-poisson is not a distribution in itself, rather an alternative method of parameter estimation. See this post for more information: overdispersion - Quasi-likelihood/Quasi Poisson - Cross Validated

2 Likes

Ahh… I see the problem. Thanks for the clarification.

Since I’m working with discrete data where the expected value mu varies with the predictor x and variance is proportional to k mu(x), not mu(x) + mu(x)^2/shape as in standard binomial regression. Making the shape parameter a function of k and mu(x) allows me to get this behavior I need shape to vary with mu(x), i.e. shape = mu/(k-1)

  1. Is there a name for such a model? It seems like parametric version of what the quasipoisson is designed to achieve.
  2. Does it already exist in brms?

@martinmodrak is this something you’ve worked with before?

1 Like

I had some time to work on this over the weekend and this seems to work. I’m calling it a qqpoisson (a quasi quasi-poisson). I plan on including some additional helper functions and then submitting it here (GitHub - paul-buerkner/custom-brms-families: R scripts for custom brms families).


library(tidyr)
library(dplyr)
library(brms)

## set up function
# define custom family
qqpoisson <- function(link_mu = "identity", link_disp = "identity")
    custom_family(
 name = "qqpoisson", 
 dpars = c("mu", "disp"), #expectd value and variance scale (>1)
 links = c(link_mu, link_disp), # only used if parameters are predicted
 lb = c(0, 1),
 ub = c(NA, NA),
 type = "int" #category of response variable
)

## Define functions used in stan
## Using `theta` as a variable gives me problems when trying to fit model, so using `disp` instead
##
stan_funs <- "
  real qqpoisson_lpmf(int y, real mu, real disp) {
    return neg_binomial_2_lpmf(y | mu, mu/(disp-1));
  }
  int qqpoisson_rng(real mu, real disp) {
    return neg_binomial_2_rng(mu, mu/(disp-1));
  }
"
## export stan_funs to stan
stanvars <- stanvar(scode = stan_funs, block = "functions")

Simulate data

Define variables

theta ← 3.5 # variance scale
t_vec ← 0:5
n_vec ← 1:5; # samples/t

Create data for linear model

data ← crossing(t = t_vec, n_vec) %>%
mutate(m = 10 + 10*t) %>%
rowwise() %>%
mutate(s = rnbinom(n = 1, mu = m, size = m/(3.5 - 1)))

Fit linear model

fit ← brm(s ~ 1 + t,
family = qqpoisson(),
data = data,
stanvars = stanvars
)

1 Like

Just to complement: there are many ways you can derive (and parametrize) the negative binomial distribution, the one used in brms is just one of the more common approaches. Most related to your approach, in some papers, you see negative binomial parametrized in almost the same way as you implemented: E(x) = mu, Var(X) = mu ( 1 + theta) for theta > 0. This is IMHO more convenient than requiring disp > 1 as you do in your code as with theta > 0 we can just use the standard link functions like log if we want to also predict theta.

The nice properties of this parametrization is that:
a) 1 + theta is the Fano factor
b) If X1 ~ NB(mu1, theta), X2 ~ NB(mu2, theta) then X1 + X2 ~ NB(mu1 + mu2, theta) (in general sum of two neg. bin variables is not negative binomial itself, except when their Fano factor is identical)
c) it has connections with the Dirichlet multinomial (see conditional on the total, what is the distribution of negative binomials - Cross Validated for some of the math)

You might want to submit the distribution also to GitHub - sims1253/bayesfam

Best of luck with your model!

1 Like

Thanks for the input. I agree it would be better to use theta rather than my dispersal parameter of disp = theta-1 given the properties you highlight. However, I have run into a problem with using theta for the dispersal term. It seems like’s a reserved term in brms/stan since when I tried to using theta in my model I got the error

Error: Currently ‘dirichlet’ is the only valid prior for simplex parameters. See help(set_prior) for more details.

So right now I’m stuck using disp = theta rather than theta explicitly.

More important, the link to your post was helpful, I especially appreciate being able to find some documentation of the NB parameterization you present (e.g. from Guimaraes and Lindrooth (2007), “… known in the econometric literature adn NEGBIN type 1 (Cameron and Trivedi, 1998).”

1 Like

Oh, you are right, theta is AFAIK reserved for simplices in brms (not in Stan).