Poisson-binomial distribution - any existing Stan implementation?


#1

I came across a neat modeling problem earlier today that actually is solved using the Poisson-binomial distribution.

Before I write a new Stan function for it, I thought I’d ask whether anyone here has already done so. The n is pretty small in my use case (n=10 maximum), so the exact formula should work fine. My instinct is that it’s just a matter of generating all the combinations to loop over.


#2

I made a quick Stan implementation of the Poisson binomial probability distribution. I tested it against the R package poisbinom. Seems to work.

This is not elegant. I started with a recursive algorithm for the combinations, but couldn’t seem to get it to work. So for sake of time, I fell back on a lexicographic strategy (like the R function combn). Only half-way through, I realized I needed only the bits, and should have just done a bit list. So this could be more efficient, but that shouldn’t matter much (I hope) unless n is large.

Here’s the code for RStan, with test at bottom:

mc <- '
functions{
    // need a function to compute poissonbinomial log-probability
    //   log-prob of k successes from n items 
    //   where prob of success of the n items is given by vector pp
    // this mainly requires looping over combinations of n-choose-k
    // for each combination, log-prob is straight-forward, given vector of probs pp

    // function for log-prob of each term, given combination of size k
    // called by lpmf below
    real lppoibin( int k, int n, int[] elements , vector pp ) {
        real lp;
        lp = 0;
        //print( elements );

        // for each possible element, accumulate probability
        for ( i in 1:n ) {
            int bang;
            bang = 0;
            for ( j in 1:k ) if ( elements[j]==i ) bang = 1;
            if ( bang==1 ) {
                // term in elements
                lp = lp + log(pp[i]);
            } else {
                // term not in elements
                lp = lp + log1m(pp[i]);
            }
        }

        return lp;
    }

    real poissonbinomial_lpmf( int k, int n, vector pp ) {
        int x[n];
        int a[k];
        int e;
        int h;
        int r[k];
        vector[ choose( n , k ) ] terms; // holds each log-prob

        for ( i in 1:n ) x[i] = i;
        for ( i in 1:k ) a[i] = i;

        // first sequence
        r = a;
        terms[1] = lppoibin( k , n , r , pp );

        // loop to lexicographically generate remaining
        if ( k > 0 ) {
            int i;
            int j;
            int nmmp1;
            i = 2;
            nmmp1 = n - k + 1;
            while ( a[1] != nmmp1 ) {
                if ( e < n - h ) {
                    h = 1;
                    e = a[k];
                    j = 1;
                    a[ k - h + j ] = e + j;
                } else {
                    e = a[k-h];
                    h = h + 1;
                    j = h;
                    for ( ii in 1:j ) a[ k - h + ii ] = e + ii;
                }
                for ( ii in 1:k ) r[ii] = x[a[ii]];
                terms[i] = lppoibin( k , n , r , pp );
                i = i + 1;
            }
        }
        return log_sum_exp(terms);
    }
}
'

cppcode <- stanc( model_code = mc , model_name = "testccp" )
expose_stan_functions( cppcode )

k <- 5
n <- 20
pp <- runif( n )
poissonbinomial_lpmf( k , n , pp )

library(poisbinom)
dpoisbinom( k , pp , log=TRUE )

#3

I quickly tried the recursive version once, maybe a year ago, and couldn’t get it to work right. This is definitely a better approach. You’re right that there’s some potential for speeding it up, but that wouldn’t extend the tractable values of n significantly. Unless there’s a clever trick to reduce the complexity that I’m not aware of (I’m not aware of most things).