# Poisson-binomial distribution - any existing Stan implementation?

#1

I came across a neat modeling problem earlier today that actually is solved using the Poisson-binomial distribution.

Before I write a new Stan function for it, I thought I’d ask whether anyone here has already done so. The n is pretty small in my use case (n=10 maximum), so the exact formula should work fine. My instinct is that it’s just a matter of generating all the combinations to loop over.

#2

I made a quick Stan implementation of the Poisson binomial probability distribution. I tested it against the R package poisbinom. Seems to work.

This is not elegant. I started with a recursive algorithm for the combinations, but couldn’t seem to get it to work. So for sake of time, I fell back on a lexicographic strategy (like the R function combn). Only half-way through, I realized I needed only the bits, and should have just done a bit list. So this could be more efficient, but that shouldn’t matter much (I hope) unless n is large.

Here’s the code for RStan, with test at bottom:

mc <- '
functions{
// need a function to compute poissonbinomial log-probability
//   log-prob of k successes from n items
//   where prob of success of the n items is given by vector pp
// this mainly requires looping over combinations of n-choose-k
// for each combination, log-prob is straight-forward, given vector of probs pp

// function for log-prob of each term, given combination of size k
// called by lpmf below
real lppoibin( int k, int n, int[] elements , vector pp ) {
real lp;
lp = 0;
//print( elements );

// for each possible element, accumulate probability
for ( i in 1:n ) {
int bang;
bang = 0;
for ( j in 1:k ) if ( elements[j]==i ) bang = 1;
if ( bang==1 ) {
// term in elements
lp = lp + log(pp[i]);
} else {
// term not in elements
lp = lp + log1m(pp[i]);
}
}

return lp;
}

real poissonbinomial_lpmf( int k, int n, vector pp ) {
int x[n];
int a[k];
int e;
int h;
int r[k];
vector[ choose( n , k ) ] terms; // holds each log-prob

for ( i in 1:n ) x[i] = i;
for ( i in 1:k ) a[i] = i;

// first sequence
r = a;
terms[1] = lppoibin( k , n , r , pp );

// loop to lexicographically generate remaining
if ( k > 0 ) {
int i;
int j;
int nmmp1;
i = 2;
nmmp1 = n - k + 1;
while ( a[1] != nmmp1 ) {
if ( e < n - h ) {
h = 1;
e = a[k];
j = 1;
a[ k - h + j ] = e + j;
} else {
e = a[k-h];
h = h + 1;
j = h;
for ( ii in 1:j ) a[ k - h + ii ] = e + ii;
}
for ( ii in 1:k ) r[ii] = x[a[ii]];
terms[i] = lppoibin( k , n , r , pp );
i = i + 1;
}
}
return log_sum_exp(terms);
}
}
'

cppcode <- stanc( model_code = mc , model_name = "testccp" )
expose_stan_functions( cppcode )

k <- 5
n <- 20
pp <- runif( n )
poissonbinomial_lpmf( k , n , pp )

library(poisbinom)
dpoisbinom( k , pp , log=TRUE )


#3

I quickly tried the recursive version once, maybe a year ago, and couldn’t get it to work right. This is definitely a better approach. You’re right that there’s some potential for speeding it up, but that wouldn’t extend the tractable values of n significantly. Unless there’s a clever trick to reduce the complexity that I’m not aware of (I’m not aware of most things).

#4

Summing over subsets is pretty much the definition of “intractable” in computer science. But it also hints at the solution—dynamic programming.

The elements p[i] are already ordered, so the dynamic programming cells will be (i, n), where n is the number of successes. So you just need a dynamic programming algorithm like the forward algorithm for HMMs. It needs to deal with the constraint that you need to get the right number of successes. The transitions are linear rather than quadratic as you can only go two ways from a state—adding a success or failure.

I think the complexity should be quadratic in the size of p, i.e., \mathcal{O}(n^2).

This is like the TopCoder hard tier problems I used to practice on.

#5

Here’s an R version of the dynamic programming algorithm on the linear scale. To make it stable, you’d need to convert it to log-sum-exp.

The dynamic programming variable alpha[n + 1, tot + 1] is the probability that the first n bernoulli trials yielded tot successes. I had to add one to accomodate the base case of the algorithm (n = 0) and zero total successes (tot = 0). It makes everything a bit more challenging to read.

I tried to make the code as clear as possible by heavy use of local variables. n corresponds to the position in theta and tot to the total number of successes observed (so far). It works left to right.

It’s easy to see that it’s quadratic and computes the answer for every possible output. So this can also compute the cdf and ccdf by just summing that final column.

# Return vector of probabilities for all outcomes in increasing order of count.
# theta : vector of probability values in (0, 1)
poisson_binomial <- function(theta) {
N <- length(theta)
if (N == 0) return(c(1));
alpha <- matrix(-Inf, N + 1, N + 1);
alpha[1, 1] <- 1;
for (n in 1:N) {
tot = 0;
alpha[n + 1, tot + 1] = alpha[n, tot + 1] * (1 - theta[n]);

if (n > 1) {
for (tot in 1:(n - 1)) {
alpha[n + 1, tot + 1] =
alpha[n, tot] * theta[n] + alpha[n, tot  + 1] * (1 - theta[n]);
}
}

tot = n;
alpha[n + 1, tot + 1] = alpha[n, tot] * theta[n];
}
return(alpha[N + 1, 1:(N + 1)]);
}


As far as I can tell, it returns the right values:

> poisson_binomial(c())
[1] 1

> poisson_binomial(c(0.6))
[1] 0.4 0.6

> poisson_binomial(c(0.3, 0.6))
[1] 0.28 0.54 0.18

> poisson_binomial(rep(0.5, 10))
[1] 0.00098 0.00977 0.04395 0.11719 0.20508 0.24609 0.20508 0.11719 0.04395 0.00977 0.00098

> for (n in 0:10) print(dbinom(n, 10, 0.5))
[1] 0.00098
[1] 0.0098
[1] 0.044
[1] 0.12
[1] 0.21
[1] 0.25
[1] 0.21
[1] 0.12
[1] 0.044
[1] 0.0098
[1] 0.00098


#6

Here’s a recursive version. The boundary conditions are somehow trickier than I was expecting:

dpois_binom <- function(y, theta) {
N <- length(theta);
if (N == 0) return(ifelse(y == 0, 1, 0));
if (N == 1) return(ifelse(y == 1, theta[1], 1 - theta[1]));
if (y == 0) return(prod(1 - theta));
# y >= 1, N >= 2
tail <- theta[2:N];
return((1 - theta[1]) * dpois_binom(y, tail) + theta[1] * dpois_binom(y - 1, tail));
}


This one just calculates a single value. But it’s still exponential in max(y, size(theta)).

#7

Here’s a more efficient Stan version that

• works on the log scale for arithmetic stability

• is quadratic in the size of the vector of probabilities

• avoids doing (most) unnecessary calculations

• only returns a single value

It could be easily modified to produce an efficient lcdf or lccdf.

functions {

/**
* Return the Poisson-binomial log probability mass for the specified
* count y and vector of probabilities theta.  The result is the log
* probability of y successes in N = size(theta) independent
* trials with probabilities of success theta[1], ..., theta[N].
*
* See:  https://en.wikipedia.org/wiki/Poisson_binomial_distribution
*
* @param y number of successes
* @param theta vector of probabilities
* @return Poisson-binomial log probability mass
*/
real poisson_binomial_lpmf(int y, vector theta) {
int N = rows(theta);
matrix[N + 1, N + 1] alpha;
vector[N] log_theta = log(theta);
vector[N] log1m_theta = log1m(theta);

if (y < 0 || y > N)
reject("poisson binomial variate y out of range, y = ", y,
" N = ", N);
for (n in 1:N)
if (theta[n] < 0 || theta[n] > 1)
reject("poisson binomial parameter out of range,",
" theta[", n, "] =", theta[n]);

if (N == 0)
return y == 0 ? 0 : negative_infinity();

// dynamic programming with cells
// alpha[n + 1, tot + 1] = log prob of tot successes in first n trials
alpha[1, 1] = 0;
for (n in 1:N) {
// tot = 0
alpha[n + 1, 1] = alpha[n, 1] + log1m_theta[n];

// 0 < tot < n
for (tot in 1:min(y, n - 1))
alpha[n + 1, tot + 1]
= log_sum_exp(alpha[n, tot] + log_theta[n],
alpha[n, tot  + 1] + log1m_theta[n]);

// tot = n
if (n > y) continue;
alpha[n + 1, n + 1] = alpha[n, n] + log_theta[n];
}
return alpha[N + 1, y + 1];
}

}
transformed data {
print("*************************************");

print("poisson_binomial_lpmf(0 | []) = ",
exp(poisson_binomial_lpmf(0 | rep_vector(1, 0))));

print("poisson_binomial_lpmf(0 | [0.3]') = ",
exp(poisson_binomial_lpmf(0 | [0.3]')));
print("poisson_binomial_lpmf(1 | [0.3]') = ",
exp(poisson_binomial_lpmf(1 | [0.3]')));

print("poisson_binomial_lpmf(0 | [0.3, 0.6]') = ",
exp(poisson_binomial_lpmf(0 | [0.3, 0.6]')));
print("poisson_binomial_lpmf(1 | [0.3, 0.6]') = ",
exp(poisson_binomial_lpmf(1 | [0.3, 0.6]')));
print("poisson_binomial_lpmf(2 | [0.3, 0.6]') = ",
exp(poisson_binomial_lpmf(2 | [0.3, 0.6]')));

for (n in 0:10)
print("poisson_binomial_lpmf(", n, " | 0.5) = ",
exp(poisson_binomial_lpmf(n | rep_vector(0.5, 10))));

print("*************************************");
}
parameters {
real<lower = 0, upper = 1> z;
}


Here’s what it prints (which is wrong—it’s printing the exponentiation of those):

poisson_binomial_lpmf(0 | []) = 1
poisson_binomial_lpmf(0 | [0.3]') = 0.7
poisson_binomial_lpmf(1 | [0.3]') = 0.3
poisson_binomial_lpmf(0 | [0.3, 0.6]') = 0.28
poisson_binomial_lpmf(1 | [0.3, 0.6]') = 0.54
poisson_binomial_lpmf(2 | [0.3, 0.6]') = 0.18
poisson_binomial_lpmf(0 | 0.5) = 0.000976562
poisson_binomial_lpmf(1 | 0.5) = 0.00976563
poisson_binomial_lpmf(2 | 0.5) = 0.0439453
poisson_binomial_lpmf(3 | 0.5) = 0.117188
poisson_binomial_lpmf(4 | 0.5) = 0.205078
poisson_binomial_lpmf(5 | 0.5) = 0.246094
poisson_binomial_lpmf(6 | 0.5) = 0.205078
poisson_binomial_lpmf(7 | 0.5) = 0.117188
poisson_binomial_lpmf(8 | 0.5) = 0.0439453
poisson_binomial_lpmf(9 | 0.5) = 0.00976563
poisson_binomial_lpmf(10 | 0.5) = 0.000976562


So it looks OK. I didn’t test the error handling.

If you have a lot of y with the same theta, it would help to just build this whole array once with the max of the y as the bound. If that’s too opaque, I can help code that, too. This should be efficient, even for large N.

#8

This is awesome, Bob. I will try to find some time this week to merge this into my model implementation and compare timings. I have a working analysis with my clunky algorithm, but it doesn’t scale well at all. So your algorithm will be a big help.

#9

Do you have a lot of observed counts y[n] for the same theta or do the theta vary with n? If so, the vectorized form can be made much more efficient compared to just calling this multiple times. The whole dynamic programming array only needs to get built for the largest y.

I created a stan-dev/math issue for the Poisson-binomial.

#10

Right now, many have same theta. I’ve quasi-vectorized my implementation by caching the log-probabilities and reusing them in the model block. Once I start adding more covariates, it’s going to get much slower, with lots of unique thetas.

#11

Indeed, thanks @Bob_Carpenter!