As @sonicking says, a binomial distribution y ~ binomial(n, theta)
whose number of trials is Poisson distributed n ~ Poisson(lambda)
marginalizes to a Poisson distribution y ~ Poisson(theta * lambda)
.
However, the likelihood, whether written in two steps as y ~ binomial(n, theta); n ~ Poisson(lambda)
or in one step as y ~ Poisson(theta * lambda)
contains absolutely no information about theta
or lambda
individually, only about their product. You will not get inference about theta
or lambda
except via the prior, and therefore you will not get inference about n
either except via the prior (and the knowledge that n must be at least as large as y
). The intuition here is that if you observe y = 10
you have nothing other than the prior to distinguish between, say, lambda = 10; theta = 1
versus lambda = 100; theta = .1
@pwsheppard you original code contains the data-dependent prior on theta
(your p
) p ~ beta(y + 1, ystar - y + 1);
. I don’t understand this choice of prior, but there may be domains where this is reasonable for some reason that I don’t understand, and gives meaningful inference about theta
and n
.
The well-known degeneracy of this likelihood, i.e. the impossibility of identifying theta
and lambda
, is the raison d’etre for the N-mixture model, which solves the problem by collecting multiple datapoints Y
(a vector) that are all binomial-poisson mixtures that share a specific realization of the Poisson model. That is int n ~ poisson(lambda); array[S] int Y ~ binomial(n, theta);
, where all S realizations of y
share a single value of n
.
The intuition here is that, for example the observation Y = [10, 10, 10, 10, 10]
is overwhelmingly more likely if lambda = 10, theta = 1
than if lambda = 100, theta = .1
.
Although the elements of Y
are all marginally Poisson distributed, their joint distribution is obviously not independent, and so we cannot simply write the N-mixture likelihood as Y ~ poisson(theta * lambda)
. There are several techniques for writing this likelihood. A closed form exists that avoids the need to truncate the sum in a marginalization, but the closed form is based on heinous functions that themselves can be computed only via numerical approximation and would be difficult to implement in Stan. Alternatively, you can just marginalize as @prototaxites suggests, choosing the upper bound of the marginalization to be large enough so that the probability mass that gets truncated is negligible. @maxbiostat has worked on a robust technique to adaptively choose the upper truncation bound to ensure that the result is accurate to within a desired tolerance. Any of these marginalization techniques can be helped by the fact that the terms in the sum can be efficiently computed via recursion.
Hope some of this helps!