Hierarchical overdispersed count models

biarritz · November 3, 2021, 1:37am

Hello,

I’m new to Stan and Bayesian statistics. I’m trying to create a hierarchical model to regress thousands of variables K in a proteomics experiment where I have dozens of donors N.

Each observation in matrix k is well modeled as a binomial, with the corresponding number of attempts in vector n. The predictor variable for each donor is lor.

The end goal is to regress each variable, account for overdispersion, pool information across variables and estimate differences and fold changes across extremes of the predictor variable.

My first attempt is below, and seems to work well. I’m looking for:

Alternatives for modeling overdispersion
Suggestions on how to model covariance
Any other criticism

data {
  int<lower=1> K;
  int<lower=1> N;
  vector[N] lor;
  int<lower=0> k[K,N];
  int<lower=0> n[N];
}
parameters {
  real mua;
  real mub;
  real<lower=0> sga;
  real<lower=0> sgb;
  vector[K] alpha;
  vector[K] beta;
}
model {
  for (i in 1:K) {
    alpha[i] ~ normal(mua, sga);
    beta[i] ~ normal(mub, sgb);
    for (j in 1:N) {
      k[i,j] ~ binomial(n[j], inv_logit(alpha[i] + beta[i] * lor[j]));
    }
  }
}
generated quantities {
  vector[K] minp;
  vector[K] maxp;
  vector[K] diff;
  vector[K] fold;
  minp = inv_logit(alpha + beta * min(lor));
  maxp = inv_logit(alpha + beta * max(lor));
  diff = maxp - minp;
  fold = maxp ./ minp;
}

biarritz · November 3, 2021, 11:50pm

Replying to myself on model covariance, I am planning to employ the usual LKJ prior for each correlation matrix i \in 1:K. However, I would be interested in hearing how people usually model the population level prior on \eta to connect it to each LKJ draw.

Furthermore, I am interested in hearing whether there are any interesting alternatives to my overdispersed binomial formulation, where I am simply modeling overdispersion through the latent intercept. I know one alternative is to use a reparametrized beta-binomial into scale and dispersion.

martinmodrak · November 10, 2021, 1:27pm

I might have missed something about your model, but wouldn’t there be a single correlation matrix and then each pair of (\alpha_i, \beta_i) is drawn from a multivariate normal distribution with this single correlation matrix times the standard deviations?

It doesn’t appear that you are modelling overdispersion, you are just having a different binomial distribution for each group of observations. The Beta-binomial is definitely a sensible choice.

Since it appears the number of trials is shared by all categories, wouldn’t your data be better represented as multinomial (or dirichlet-multinomial to model dispersion)? Note that this would imply that you have one less degree of freedom and need to treat one category as reference (or enforce some other constraint to make the model identified - see e.g. Multinomial logistic regression - Wikipedia for some background.

Best of luck with your model!

Topic		Replies	Views
Prior for over-dispersion parameter in neg_binomial_2_log_lpmf Modeling	1	1441	November 22, 2018
Underdispersed binomial glm Modeling techniques	2	2916	August 15, 2018
Underdispersed binomial/overdispersed beta-binomial Modeling	0	507	December 21, 2022
Model overdispersion in function of covariate for binary data Modeling techniques , specification , brms	4	77	February 10, 2025
Beta-Binomial model with rethinking package and dispersion for each interaction Modeling rethinking	2	619	March 16, 2023

Hierarchical overdispersed count models

Related topics