Mixture of generalized poisson distribution

I’m trying to fit a mixture of Generalized Poisson’s to simulated data. I was able to define a LPMF for a GP and fit a non-mixture dataset fine:

functions {
    real genpoiss_lpmf(int y, real lambda1, real lambda2) {
        return log(lambda1) + 
               (y - 1) * log(lambda1 + y * lambda2) - 
               lgamma(y + 1) - 
               y * lambda2 - lambda1;
    }
}

data {
    // data
    int N;
    int y[N];
    vector[N] count;

    // priors
    real<lower=0> mu0;
    real<lower=0> sigma0;
    real<lower=0> phi0;
}

parameters {
    real<lower=0> mu;
    real<lower=0> sigma2;
}

transformed parameters {
    real<lower=0> lambda1;
    real lambda2;

    lambda1 = mu * sqrt(mu / sigma2);
    lambda2 = 1 - sqrt(mu / sigma2);
}

model {
    real contributions;

    mu ~ normal(mu0, sigma0);
    sigma2 ~ normal(0, phi0);

    for (n in 1:N) {
        contributions = genpoiss_lpmf(width[n] | lambda1, lambda2);
        target += contributions * count[n];
    }
}

However, even an “easy” mixture model (large separation between means, only two components) does not recover the parameters well - most of the transitions after warmup are divergent despite increase the adapt_delta. Additionally, initializing stan with the true parameter values throws

Rejecting initial value:
  Log probability evaluates to log(0), i.e. negative infinity.
  Stan can't start sampling from this initial value.

until failing after 100 attempts, so I feel like I must have an error in my stan model.

Below is the code to simulate the data in R as well as the stan code to fit the model.

functions {
    real genpoiss_lpmf(int y, real lambda1, real lambda2) {
        return log(lambda1) + 
               (y - 1) * log(lambda1 + y * lambda2) - 
               lgamma(y + 1) - 
               y * lambda2 - lambda1;
    }
}

data {
    // data
    int N; // number observations
    int K; // number clusters
    int y[N]; // fragment size
    vector[N] count; // count of fragments of size n

    // priors
    vector<lower=0>[K] mu0;
    real<lower=0> sigma0;
    real<lower=0> phi0;
    real<lower=0> alpha;
}

parameters {
    vector<lower=0>[K] mu;
    vector<lower=0>[K] sigma2;
    simplex[K] theta;
}

transformed parameters {
    vector<lower=0>[K] lambda1;
    vector[K] lambda2;

    lambda1 = mu .* sqrt(mu ./ sigma2);
    lambda2 = 1 - sqrt(mu ./ sigma2);
}

model {
    real contributions[K];

    // prior
    for (k in 1:K) {
        mu ~ normal(mu0[k], sigma0);
    }

    sigma2 ~ normal(0, phi0);
    theta ~ dirichlet(rep_vector(alpha / K, K));

    for (n in 1:N) {
        for (k in 1:K) {
            contributions[k] = log(theta[k]) + genpoiss_lpmf(y[n] | lambda1[k], lambda2[k]);
        }

        target += log_sum_exp(contributions) * count[n];
    }
}
library(rstan)
library(HMMpa)

# simulate data
mu <- c(20, 50)
sigma2 <- c(3, 4)
theta <- c(0.4, 0.6)

lambda1 <- mu * sqrt(mu / sigma2)
lambda2 <- 1 - sqrt(mu / sigma2)

set.seed(1)
n <- 1000
z <- sample(1:length(theta), n, replace=TRUE, prob=theta)
y <- rep(0, n)
for (i in 1:n) {
    y[i] <- rgenpois(1, lambda1[z[i]], lambda2[z[i]])
}

tmp <- table(y)
y.value <- as.numeric(names(tmp))
y.count <- as.numeric(tmp)
plot(y.value, y.count)

# fit model in stan

model <- stan_model("stan/gpd-mixture.stan", model_name="gpd-mixture")

stan_data <- list(N=length(y.value),
                  K=length(mu),
                  y=y.value,
                  count=y.count,
                  mu0=mu,
                  sigma0=1e-2,
                  phi0=2,
                  alpha=1)

initf <- function() {
    list(mu=mu, sigma2=sigma2, theta=theta)
}

results <- sampling(model, data=stan_data,
                    chains=4, cores=4,
                    init=initf,
                    #control=list(adapt_delta=0.999, max_treedepth=15),
                    seed=428, iter=2000)

IMHO, If generalised possion is a distribution over \mathbb{N}^+ then a fractional combination of them (a mixture) will end up ranging over \mathbb{R}^+ . Presumably this is not what you want since your data is in \mathbb{N}^+ and therefore this model is misspecified?

This confuses the random variable with the p.m.f. The random variable is still defined over the non-negative integers. For the generalised Poisson, see this.

2 Likes

Thanks @emiruz and @maxbiostat. I think my issue is that my function for genpoiss is defined everywhere, but the support is limited. Based on the paper by Consul and Shoukri, I think the support is where y >= -lambda1 / lambda2, which would make sense due to the log(lambda1 + y * lambda2) term. I think I need to include an if else statement that returns log(0) when y is less than that value.

This is one of the big reasons I keep coming back to this forum; smart folks like you highlight to me gaps in knowledge I probably wouldn’t have otherwise paid attention to. You’re probably right: I’m probably just confused… but I’m confused in a complicated way :-) Something feels wrong here and I’d be grateful if you’d help me understand.

In the regular Poisson k \sim Poisson(\theta), k has to be an integer because it’s a factorial in the Poisson equation. The range of the pmf is definitely positive integer because k requires it. Further, Poisson has an inverse function that gets us from p to k and I’d guess the mapping is one to one.

I presume these same things are true for the generalised poisson, but this wouldn’t be true for the mixture of generalised poissons (or even the mixture of poisson’s … ) because it’s inverse function could have a bigger range than the domain of k … it feels like there is a mismatch of some kind.

Can you help unknot me?

Nope. The factorial can be extended to \mathbb{R}^+ by a subtle application of analytic continuation. I don’t believe thinking in terms of inverse distribution functions is very helpful here. Without appealing to the Radon-Nikodym theorem, there isn’t much hope of a complete description, but I shall try. Let \bar{\mathbb{N}} := \mathbb{N}\cup \{0\} .

If f_i : \bar{\mathbb{N}} \to (0, \infty), i = 1, 2, \ldots, K are a set of p.m.f.s, then for \alpha_i \geq 0 and \sum_{i=1}^K \alpha_i = 1,

g(x) = \sum_{i=1}^K \alpha_i f_i(x), x \in \bar{\mathbb{N}},

is also a p.m.f. Notice g is not the p.m.f. of

Y = \sum_{i=1}^K \alpha_i X_i,

which is a random variable that lives in \mathbb{R}^+.

1 Like

I’m wondering if it’s possible to evaluate this mixture model likelihood in Stan where latent classes are “integrated” over. For y[1], the log likelihood resolves to log(0) with the “true” parameters, which causes Stan doesn’t like. Any suggested workarounds?

Have you had a look at the relevant chapter of the manual?

Thanks @maxbiostat. I have read that chapter and am basing my model off of the last code block of 5.3 (Summing out the Responsibility Parameter).

Does your model converge if you remove the entries with log(0) likelihood?

Stan will sample but not converge.

You say

But we should have \lambda_1, \lambda_2 > 0, no? If so, there’s no problem. Also, this

is impossible. If y came from a distribution with density f(y \mid \theta_0), its likelihood under \theta_0 can’t be zero. Not outside of a set of measure zero anyways. Can you explain why lambda2 isn’t constrained to <lower=0,upper=1> ?

Lambda2 controls the dispersion - when lambda2 =0, the distribution is a Poisson, with lambda2 > 0 the distribution allows for over dispersion and < 0 allows for underdispersion.

I think there is an issue where lambda2 has a lower bound, but it’s not 0. I’ve seen a variety of suggestions, including lambda2 >= lambda1 / 4.

Is there a suggested way of including such a bound in a custom distribution in stan?

Hey, I had a look at your code. The problem is with this term here: log(\theta+\lambda y). The constraints for GP are \theta>0 and -1<\lambda<1, by when \lambda y>\theta \to log(\theta+\lambda y)=nan. I skimmed a couple papers and this is a known issue with this model. HMC needs smoothness, and I think what happens is that the nan’s throw it off.

First part of this issue you can fix by constraining \lambda>0. You don’t get the full flexibility of the model but removes the degeneracy.

Second issue the model has is label switching which you can fix by ordering lambda1.

I simplified your model a bit for my own sanity but here it is working:

functions {
    real genpoiss_lpmf(int y, real theta, real lambda) {
     return log(theta)+
            (y-1)*log(theta+lambda*y)
            -lgamma(y+1)-theta-lambda*y;
    }
}

data {
    int N;
    int y[N];
    int count[N];
}

parameters {
    positive_ordered[2] lam1;
    vector<lower=0.0001,upper=1>[2] lam2;
    real<lower=0,upper=1> theta;
}

model {
   lam1~normal(0,100);
   lam2~normal(0,100);
   theta~beta(1,1);
    for (n in 1:N) {
      target += log_mix(theta,
                        genpoiss_lpmf(y[n] | lam1[1], lam2[1]),
                        genpoiss_lpmf(y[n] | lam1[2], lam2[2]))*count[n];
    }
}

Result:

             mean se_mean   sd     2.5%      25%      50%      75%    97.5%
 lam1[1]    19.81    0.00 0.23    19.35    19.66    19.81    19.96    20.24
 lam1[2]    49.85    0.00 0.30    49.26    49.65    49.85    50.05    50.45
 lam2[1]     0.00    0.00 0.00     0.00     0.00     0.00     0.00     0.01
 lam2[2]     0.00    0.00 0.00     0.00     0.00     0.00     0.00     0.01
 theta       0.39    0.00 0.02     0.37     0.38     0.39     0.40     0.42
 lp__    -3437.16    0.04 1.62 -3441.34 -3437.96 -3436.83 -3435.98 -3435.00
         n_eff Rhat
 lam1[1]  2706    1
 lam1[2]  4236    1
 lam2[1]  3055    1
 lam2[2]  3887    1
 theta    3599    1
 lp__     1617    1

2 Likes

Thank you both for your help. I discovered that there was a typo in my mixture model in stan:

model {
    real contributions[K];

    // prior
    for (k in 1:K) {
        mu ~ normal(mu0[k], sigma0);
    }

Should be

model {
    real contributions[K];

    // prior
    for (k in 1:K) {
        mu[k] ~ normal(mu0[k], sigma0);
    }

Additionally, I turned the priors into gammas to make things simpler. Making both of these changes improves the model performance on some datasets, but not all.

@emiruz - to your point, there is clearly a limit on how much underdispersion a generalized poisson can model. I need to allow lambda to be less than 0, but marginalizing over the mixture components is going to be an issue when some components have large means and small variance if there is another component with a small mean.

Perhaps another distribution would be a better choice, not sure if there is a count distribution that can incorporate underdispersion that is better behaved.

Trouble is I’m not sure NUTS will ever be able to identify an underdispersed component if it means λy>θ→log(θ+λy)=nan . Just IMHO, but you may be at a loss attempting to fit this in Stan. The degenerecy is at the level of the component not the mixture so it may be worth reading around how other people fit underdispersed models in the first place. If you know how or find out it’d be great if you could report back and we can think about whether there’s some Stan compatible way of doing it.

1 Like

@emiruz (and anyone else encountering this problem!) - I have had better luck with a double poisson distribution described here and here. This is another count distribution that simplifies to a Poisson and also allows for both underdisperion and over dispersion. Some nice properties include: 1) parameters are mean and dispersion, so much easier to put priors on and 2) the support is much simpler. The requirement is that both parameters are positive.

2 Likes

It’s good of you to report back, and it’s useful to know!

I’d be very interested in how you fitted the model in Stan, if you have any code you could share, that would be very appreciated!

Best Regards

1 Like