Implementing quasi-poisson error model

mikegee · February 23, 2023, 7:55pm

I’m interested in testing various models of error for some discrete count data. While i can implement the poisson and negbinomial in brms, I don’t see an easy way to fit an overdispersed poisson (i.e. a quasipoisson) using brms.

My understanding is that x ~ qpoisson(mu, theta) then E(x) = mu and var(x) = theta mu, but if x ~ negbinomial(mu, shape), then E(x) = mu and var(x) =mu + mu^2/shape . Thus, the way to parameterize x ~ qpoisson is with x ~ negbinomial(mu, shape = mu/(theta - 1)), but now theta is the parameter of interest, not shape.

Is there an easy way of implementing this without having to define my own family? If not, is there a reason why this family isn’t included in brms?

andrjohns · February 23, 2023, 11:52pm

This is because the quasi-poisson is not a distribution in itself, rather an alternative method of parameter estimation. See this post for more information: overdispersion - Quasi-likelihood/Quasi Poisson - Cross Validated

mikegee · February 24, 2023, 10:21pm

Ahh… I see the problem. Thanks for the clarification.

Since I’m working with discrete data where the expected value mu varies with the predictor x and variance is proportional to k mu(x), not mu(x) + mu(x)^2/shape as in standard binomial regression. Making the shape parameter a function of k and mu(x) allows me to get this behavior I need shape to vary with mu(x), i.e. shape = mu/(k-1)

Is there a name for such a model? It seems like parametric version of what the quasipoisson is designed to achieve.
Does it already exist in brms?

andrjohns · February 25, 2023, 5:59am

@martinmodrak is this something you’ve worked with before?

mikegee · February 27, 2023, 1:29am

I had some time to work on this over the weekend and this seems to work. I’m calling it a qqpoisson (a quasi quasi-poisson). I plan on including some additional helper functions and then submitting it here (GitHub - paul-buerkner/custom-brms-families: R scripts for custom brms families).


library(tidyr)
library(dplyr)
library(brms)

## set up function
# define custom family
qqpoisson <- function(link_mu = "identity", link_disp = "identity")
    custom_family(
 name = "qqpoisson", 
 dpars = c("mu", "disp"), #expectd value and variance scale (>1)
 links = c(link_mu, link_disp), # only used if parameters are predicted
 lb = c(0, 1),
 ub = c(NA, NA),
 type = "int" #category of response variable
)

## Define functions used in stan
## Using `theta` as a variable gives me problems when trying to fit model, so using `disp` instead
##
stan_funs <- "
  real qqpoisson_lpmf(int y, real mu, real disp) {
    return neg_binomial_2_lpmf(y | mu, mu/(disp-1));
  }
  int qqpoisson_rng(real mu, real disp) {
    return neg_binomial_2_rng(mu, mu/(disp-1));
  }
"
## export stan_funs to stan
stanvars <- stanvar(scode = stan_funs, block = "functions")

Simulate data

Define variables

theta ← 3.5 # variance scale
t_vec ← 0:5
n_vec ← 1:5; # samples/t

Create data for linear model

data ← crossing(t = t_vec, n_vec) %>%
mutate(m = 10 + 10*t) %>%
rowwise() %>%
mutate(s = rnbinom(n = 1, mu = m, size = m/(3.5 - 1)))

Fit linear model

fit ← brm(s ~ 1 + t,
family = qqpoisson(),
data = data,
stanvars = stanvars
)

martinmodrak · February 27, 2023, 2:05pm

Just to complement: there are many ways you can derive (and parametrize) the negative binomial distribution, the one used in brms is just one of the more common approaches. Most related to your approach, in some papers, you see negative binomial parametrized in almost the same way as you implemented: E(x) = mu, Var(X) = mu ( 1 + theta) for theta > 0. This is IMHO more convenient than requiring disp > 1 as you do in your code as with theta > 0 we can just use the standard link functions like log if we want to also predict theta.

The nice properties of this parametrization is that:
a) 1 + theta is the Fano factor
b) If X1 ~ NB(mu1, theta), X2 ~ NB(mu2, theta) then X1 + X2 ~ NB(mu1 + mu2, theta) (in general sum of two neg. bin variables is not negative binomial itself, except when their Fano factor is identical)
c) it has connections with the Dirichlet multinomial (see conditional on the total, what is the distribution of negative binomials - Cross Validated for some of the math)

You might want to submit the distribution also to GitHub - sims1253/bayesfam

Best of luck with your model!

mikegee · March 1, 2023, 1:19pm

Thanks for the input. I agree it would be better to use theta rather than my dispersal parameter of disp = theta-1 given the properties you highlight. However, I have run into a problem with using theta for the dispersal term. It seems like’s a reserved term in brms/stan since when I tried to using theta in my model I got the error

Error: Currently ‘dirichlet’ is the only valid prior for simplex parameters. See help(set_prior) for more details.

So right now I’m stuck using disp = theta rather than theta explicitly.

More important, the link to your post was helpful, I especially appreciate being able to find some documentation of the NB parameterization you present (e.g. from Guimaraes and Lindrooth (2007), “… known in the econometric literature adn NEGBIN type 1 (Cameron and Trivedi, 1998).”

martinmodrak · March 2, 2023, 9:19am

Oh, you are right, theta is AFAIK reserved for simplices in brms (not in Stan).

Topic		Replies	Views
Poisson or Neg. Binomial with non-integer response Modeling rstan , specification , brms	2	1096	November 29, 2022
Post-processing com_poisson model in brms brms	5	548	January 1, 2020
Negative binomial shape and phi parameters brms	14	6087	April 4, 2019
Scaling of the overdispersion in negative binomial models brms	23	3873	May 25, 2021
Brms custom family continuous bernoulli Events techniques , specification , brms	2	432	February 20, 2023

Implementing quasi-poisson error model

Simulate data

Define variables

Create data for linear model

Fit linear model

Related Topics