Modeling problem when N obeys Poisson distribution in binomial distribution

Flame · September 15, 2021, 10:01am

Dear All,

I now encounter a modeling problem, similar to assuming that the data obeys the binomial distribution, where N obeys the Poisson distribution.
y \sim binomial(N,p),\ \ \ \ N \sim poisson(lambda)

Since stan seems unable to handle the situation where N is positive infinity, I assume that there is an upper bound on N, such as 50.

I tried to write the model code

data {
  int<lower=0> N;
  int<lower=0> y[N];
  int<lower=0> max_N;
}

parameters {
  real<lower=0> lambda;
  real<lower=0, upper=1> p;
}
transformed parameters {
  vector[N] lp;
  for (i in 1:N){
    for  (j in (y[i]+1):max_N){
      lp[i] += poisson_lpmf(j | lambda) + binomial_lpmf(y[i] | j,p);
    }
  }
}

model {
  target += log_sum_exp(lp);
}

But got the following error.

Chain 4 Rejecting initial value:
Chain 4 Log probability evaluates to log(0), i.e. negative infinity.
Chain 4 Stan can’t start sampling from this initial value.

It seems that the calculation of which step got 0.

Thanks for any advice and it would help me a lot.

martinmodrak · September 15, 2021, 11:42am

Unless, I am misunderstanding something about your model, it should be mathematically equivalent to just y \sim Poisson (p \lambda), so I would expect this form to greatly improve the behavior of your model.

More generally, to debug initial value rejected problems it is often helpful to use print statements to inspect intermediate values and see where the negative infinity creeps in.

Best of luck with your model!

Flame · September 16, 2021, 9:16am

Thank you for your reply

In fact, the model I really want to estimate is more complex.

\begin{equation} \begin{aligned} N &\sim possion(\lambda) \\ p &\sim zipf(s,N) = \frac{(1/k^s)}{\sum_{i=1}^{N}1/i^s}\\ q &= 1-(1-p)^M\\ n &\sim poisson\_binomial(q) \end{aligned} \end{equation}

Given M,n, estimation of \lambda and S

It is indeed a good point to transform the parameters.

I will try it again. Thank you very much!

maxbiostat · September 18, 2021, 10:16pm

What is k in your model ? The way you wrote this, it seems p is an integer-valued random variable, with support in [0, N].

Flame · September 19, 2021, 6:04am

Oops, sorry, I made a mistake in my expression.
p is a vector of length N. The value taken is determined by the zipf distribution.

So the second and third lines should probably be written as

\begin{equation} \begin{aligned} p_k&\sim zipf(k|s,N)\ \ (k=1,2,...,N)\\ q_k&=1-(1-p_k)^M \end{aligned} \end{equation}

Thank you for the correction!

martinmodrak · September 24, 2021, 8:46am

Hi,
sorry, I probably still don’t understand your model:

Is the Zipf distribution you refer to the same as what the Wikipedia has at Zipf's law - Wikipedia ? And does your Poisson Binomial match the formula at Poisson binomial distribution - Wikipedia ? If yes then p_k \geq 0 is an integer, right? But you need 0 < p_k < 1 for the Poisson binomial. But that is a contradiction, as with p_k with non-negative integer support, q_k can only be exactly 0, exactly 1 or smaller than zero… So I must be missing something.

Could you clarify where I am wrong?

Flame · September 25, 2021, 5:16pm

Thank you for your reply

I am not very proficient in statistical symbols. I may have misled you. I am very sorry.

Please allow me to give an example

Assume \lambda=3, s=1, M=2

First get N from the Poisson distribution, if it is 2

p is the probability of the zipf distribution, so

\begin{equation} \begin{aligned} p &= [\frac{1/1^1}{\sum_{i=1}^2 1/i^1},\ \ \frac{1/2^1}{\sum_{i=1}^ 2 1/i^1}]=[2/3,\ \ 1/3]\\ q &= [1-[1-2/3]^2,\ \ 1-[1-1/3]^2] = [8/9,\ \ 5/9] \end{aligned} \end{equation}

Then get the data \sim Poisson\_binomial([8/9,\ \ 5/9]), can be 2

Given a plurality of M and the matching data, \lambda and s should be estimable, I hope to get these two parameters

Looking forward to your suggestions

maxbiostat · September 28, 2021, 1:20pm

This model is still missing something. If you can describe the problem you’re trying to solve, we can help you fix the model so it makes statistical sense. As it stands, you’re plugging the value of a probability mass function (pmf) into another pmf (Poisson-binomial), which is weird to say the least.

martinmodrak · September 28, 2021, 5:05pm

I agree with @maxbiostat that the model as you describe it is a bit weird so maybe there is a better solution if you describe the problem more fully. However, provided you are sure this is a model you want to implement, it doesn’t appear there is some nice analytical solution to marginalize N out completely, so some sort of partial-sum as you’ve proposed might be the best one can do.

Unfortunately, the partial sum will likely be costly unless \lambda (and hence the sensible range for N) is small. The Poisson-binomial is itself a distribution that is quite costly (at least quadratic in N) to compute (see Poisson-binomial distribution - any existing Stan implementation? - #7 by Bob_Carpenter if you haven’t already), so the model might be very slow…

It should however be possible to use something like simulation-based calibration (e.g. via the SBC package I am helping to develop) against your full simulator to understand to what extent does the partial sum approximation skew your results away from the exact solution.

Best of luck with your model!

Flame · September 30, 2021, 10:09am

Thank you for your replies!

Actually, I am now trying to reproduce the paper “General type-token distribution” (https://arxiv.org/pdf/1305.0328.pdf).

In short, this paper try to infer author’s vocabulary through his book “Alice’s Adventures in Wonderland” use the type-token relationship.

Generally, texts obey the zipf distribution. So that’s the reason for the zipf distribution here.

About the Poisson distribution. In the end of the Supplementary material, the authors mention that
"As the difficulty of optimization stems from the discrete nature of the parameter N for the family of Zipf distributions, we introduced a smooth parameter \lambda which determines N. We did this by assuming that N is a Poisson random variable with parameter \lambda".

The original paper used the expectation-maximization algorithm for estimation. I would like to try to carry out whether it is possible to perform Bayesian estimation. (In fact I am confused about how to perform estimation using the expectation-maximization algorithm.)

I think the model I described should be correct. Did I miss something?

Indeed, I am not very familiar with stan either. If you have any reference for the partial sum you mentioned, please let me know.

Thank you all for your help!

Topic		Replies	Views
Rejecting initial value: Log probability evaluates to log(0), i.e. negative infinity Modeling	2	412	February 19, 2021
Initialization failed, initial values rejected Modeling	10	2381	October 17, 2018
Negative_binomial_2: should i be worried about metropolis proposal rejection in warmup phase? Modeling	2	1241	August 28, 2018
Count data modeling with discrete parameters Modeling specification	7	1476	July 13, 2017
Random Poisson variable is a negative but must be nonnegative! Modeling fitting-issues , poisson	6	400	February 13, 2024

Modeling problem when N obeys Poisson distribution in binomial distribution

Related topics