I now encounter a modeling problem, similar to assuming that the data obeys the binomial distribution, where N obeys the Poisson distribution. y \sim binomial(N,p),\ \ \ \ N \sim poisson(lambda)

Since stan seems unable to handle the situation where N is positive infinity, I assume that there is an upper bound on N, such as 50.

I tried to write the model code

data {
int<lower=0> N;
int<lower=0> y[N];
int<lower=0> max_N;
}
parameters {
real<lower=0> lambda;
real<lower=0, upper=1> p;
}
transformed parameters {
vector[N] lp;
for (i in 1:N){
for (j in (y[i]+1):max_N){
lp[i] += poisson_lpmf(j | lambda) + binomial_lpmf(y[i] | j,p);
}
}
}
model {
target += log_sum_exp(lp);
}

But got the following error.

Chain 4 Rejecting initial value:
Chain 4 Log probability evaluates to log(0), i.e. negative infinity.
Chain 4 Stan canâ€™t start sampling from this initial value.

It seems that the calculation of which step got 0.

Unless, I am misunderstanding something about your model, it should be mathematically equivalent to just y \sim Poisson (p \lambda), so I would expect this form to greatly improve the behavior of your model.

More generally, to debug initial value rejected problems it is often helpful to use print statements to inspect intermediate values and see where the negative infinity creeps in.

Hi,
sorry, I probably still donâ€™t understand your model:

Is the Zipf distribution you refer to the same as what the Wikipedia has at Zipf's law - Wikipedia ? And does your Poisson Binomial match the formula at Poisson binomial distribution - Wikipedia ? If yes then p_k \geq 0 is an integer, right? But you need 0 < p_k < 1 for the Poisson binomial. But that is a contradiction, as with p_k with non-negative integer support, q_k can only be exactly 0, exactly 1 or smaller than zeroâ€¦ So I must be missing something.

This model is still missing something. If you can describe the problem youâ€™re trying to solve, we can help you fix the model so it makes statistical sense. As it stands, youâ€™re plugging the value of a probability mass function (pmf) into another pmf (Poisson-binomial), which is weird to say the least.

I agree with @maxbiostat that the model as you describe it is a bit weird so maybe there is a better solution if you describe the problem more fully. However, provided you are sure this is a model you want to implement, it doesnâ€™t appear there is some nice analytical solution to marginalize N out completely, so some sort of partial-sum as youâ€™ve proposed might be the best one can do.

Unfortunately, the partial sum will likely be costly unless \lambda (and hence the sensible range for N) is small. The Poisson-binomial is itself a distribution that is quite costly (at least quadratic in N) to compute (see Poisson-binomial distribution - any existing Stan implementation? - #7 by Bob_Carpenter if you havenâ€™t already), so the model might be very slowâ€¦

It should however be possible to use something like simulation-based calibration (e.g. via the SBC package I am helping to develop) against your full simulator to understand to what extent does the partial sum approximation skew your results away from the exact solution.

In short, this paper try to infer authorâ€™s vocabulary through his book â€śAliceâ€™s Adventures in Wonderlandâ€ť use the type-token relationship.

Generally, texts obey the zipf distribution. So thatâ€™s the reason for the zipf distribution here.

About the Poisson distribution. In the end of the Supplementary material, the authors mention that
"As the difficulty of optimization stems from the discrete nature of the parameter N for the family of Zipf distributions, we introduced a smooth parameter \lambda which determines N. We did this by assuming that N is a Poisson random variable with parameter \lambda".

The original paper used the expectation-maximization algorithm for estimation. I would like to try to carry out whether it is possible to perform Bayesian estimation. (In fact I am confused about how to perform estimation using the expectation-maximization algorithm.)

I think the model I described should be correct. Did I miss something?

Indeed, I am not very familiar with stan either. If you have any reference for the partial sum you mentioned, please let me know.