Here’s a more efficient Stan version that
-
works on the log scale for arithmetic stability
-
is quadratic in the size of the vector of probabilities
-
avoids doing (most) unnecessary calculations
-
only returns a single value
It could be easily modified to produce an efficient lcdf or lccdf.
functions {
/**
* Return the Poisson-binomial log probability mass for the specified
* count y and vector of probabilities theta. The result is the log
* probability of y successes in N = size(theta) independent
* trials with probabilities of success theta[1], ..., theta[N].
*
* See: https://en.wikipedia.org/wiki/Poisson_binomial_distribution
*
* @param y number of successes
* @param theta vector of probabilities
* @return Poisson-binomial log probability mass
*/
real poisson_binomial_lpmf(int y, vector theta) {
int N = rows(theta);
matrix[N + 1, N + 1] alpha;
vector[N] log_theta = log(theta);
vector[N] log1m_theta = log1m(theta);
if (y < 0 || y > N)
reject("poisson binomial variate y out of range, y = ", y,
" N = ", N);
for (n in 1:N)
if (theta[n] < 0 || theta[n] > 1)
reject("poisson binomial parameter out of range,",
" theta[", n, "] =", theta[n]);
if (N == 0)
return y == 0 ? 0 : negative_infinity();
// dynamic programming with cells
// alpha[n + 1, tot + 1] = log prob of tot successes in first n trials
alpha[1, 1] = 0;
for (n in 1:N) {
// tot = 0
alpha[n + 1, 1] = alpha[n, 1] + log1m_theta[n];
// 0 < tot < n
for (tot in 1:min(y, n - 1))
alpha[n + 1, tot + 1]
= log_sum_exp(alpha[n, tot] + log_theta[n],
alpha[n, tot + 1] + log1m_theta[n]);
// tot = n
if (n > y) continue;
alpha[n + 1, n + 1] = alpha[n, n] + log_theta[n];
}
return alpha[N + 1, y + 1];
}
}
transformed data {
print("*************************************");
print("poisson_binomial_lpmf(0 | []) = ",
exp(poisson_binomial_lpmf(0 | rep_vector(1, 0))));
print("poisson_binomial_lpmf(0 | [0.3]') = ",
exp(poisson_binomial_lpmf(0 | [0.3]')));
print("poisson_binomial_lpmf(1 | [0.3]') = ",
exp(poisson_binomial_lpmf(1 | [0.3]')));
print("poisson_binomial_lpmf(0 | [0.3, 0.6]') = ",
exp(poisson_binomial_lpmf(0 | [0.3, 0.6]')));
print("poisson_binomial_lpmf(1 | [0.3, 0.6]') = ",
exp(poisson_binomial_lpmf(1 | [0.3, 0.6]')));
print("poisson_binomial_lpmf(2 | [0.3, 0.6]') = ",
exp(poisson_binomial_lpmf(2 | [0.3, 0.6]')));
for (n in 0:10)
print("poisson_binomial_lpmf(", n, " | 0.5) = ",
exp(poisson_binomial_lpmf(n | rep_vector(0.5, 10))));
print("*************************************");
}
parameters {
real<lower = 0, upper = 1> z;
}
Here’s what it prints (which is wrong—it’s printing the exponentiation of those):
poisson_binomial_lpmf(0 | []) = 1
poisson_binomial_lpmf(0 | [0.3]') = 0.7
poisson_binomial_lpmf(1 | [0.3]') = 0.3
poisson_binomial_lpmf(0 | [0.3, 0.6]') = 0.28
poisson_binomial_lpmf(1 | [0.3, 0.6]') = 0.54
poisson_binomial_lpmf(2 | [0.3, 0.6]') = 0.18
poisson_binomial_lpmf(0 | 0.5) = 0.000976562
poisson_binomial_lpmf(1 | 0.5) = 0.00976563
poisson_binomial_lpmf(2 | 0.5) = 0.0439453
poisson_binomial_lpmf(3 | 0.5) = 0.117188
poisson_binomial_lpmf(4 | 0.5) = 0.205078
poisson_binomial_lpmf(5 | 0.5) = 0.246094
poisson_binomial_lpmf(6 | 0.5) = 0.205078
poisson_binomial_lpmf(7 | 0.5) = 0.117188
poisson_binomial_lpmf(8 | 0.5) = 0.0439453
poisson_binomial_lpmf(9 | 0.5) = 0.00976563
poisson_binomial_lpmf(10 | 0.5) = 0.000976562
So it looks OK. I didn’t test the error handling.
If you have a lot of y
with the same theta
, it would help to just build this whole array once with the max of the y
as the bound. If that’s too opaque, I can help code that, too. This should be efficient, even for large N
.