Poisson-binomial distribution - any existing Stan implementation?

Here’s a more efficient Stan version that

  • works on the log scale for arithmetic stability

  • is quadratic in the size of the vector of probabilities

  • avoids doing (most) unnecessary calculations

  • only returns a single value

It could be easily modified to produce an efficient lcdf or lccdf.

functions {

  /**
   * Return the Poisson-binomial log probability mass for the specified
   * count y and vector of probabilities theta.  The result is the log
   * probability of y successes in N = size(theta) independent
   * trials with probabilities of success theta[1], ..., theta[N].
   *
   * See:  https://en.wikipedia.org/wiki/Poisson_binomial_distribution
   *
   * @param y number of successes
   * @param theta vector of probabilities
   * @return Poisson-binomial log probability mass
   */
  real poisson_binomial_lpmf(int y, vector theta) {
    int N = rows(theta);
    matrix[N + 1, N + 1] alpha;
    vector[N] log_theta = log(theta);
    vector[N] log1m_theta = log1m(theta);

    if (y < 0 || y > N)
      reject("poisson binomial variate y out of range, y = ", y,
             " N = ", N);
    for (n in 1:N)
      if (theta[n] < 0 || theta[n] > 1)
        reject("poisson binomial parameter out of range,",
               " theta[", n, "] =", theta[n]);

    if (N == 0)
      return y == 0 ? 0 : negative_infinity();

    // dynamic programming with cells
    // alpha[n + 1, tot + 1] = log prob of tot successes in first n trials
    alpha[1, 1] = 0;
    for (n in 1:N) {
      // tot = 0
      alpha[n + 1, 1] = alpha[n, 1] + log1m_theta[n];

      // 0 < tot < n
      for (tot in 1:min(y, n - 1))
        alpha[n + 1, tot + 1]
            = log_sum_exp(alpha[n, tot] + log_theta[n],
                          alpha[n, tot  + 1] + log1m_theta[n]);

      // tot = n
      if (n > y) continue;
      alpha[n + 1, n + 1] = alpha[n, n] + log_theta[n];
    }
    return alpha[N + 1, y + 1];
  }

}
transformed data {
  print("*************************************");

  print("poisson_binomial_lpmf(0 | []) = ",
        exp(poisson_binomial_lpmf(0 | rep_vector(1, 0))));

  print("poisson_binomial_lpmf(0 | [0.3]') = ",
        exp(poisson_binomial_lpmf(0 | [0.3]')));
  print("poisson_binomial_lpmf(1 | [0.3]') = ",
        exp(poisson_binomial_lpmf(1 | [0.3]')));

  print("poisson_binomial_lpmf(0 | [0.3, 0.6]') = ",
        exp(poisson_binomial_lpmf(0 | [0.3, 0.6]')));
  print("poisson_binomial_lpmf(1 | [0.3, 0.6]') = ",
        exp(poisson_binomial_lpmf(1 | [0.3, 0.6]')));
  print("poisson_binomial_lpmf(2 | [0.3, 0.6]') = ",
        exp(poisson_binomial_lpmf(2 | [0.3, 0.6]')));

  for (n in 0:10)
    print("poisson_binomial_lpmf(", n, " | 0.5) = ",
          exp(poisson_binomial_lpmf(n | rep_vector(0.5, 10))));

  print("*************************************");
}
parameters {
  real<lower = 0, upper = 1> z;
}

Here’s what it prints (which is wrong—it’s printing the exponentiation of those):

poisson_binomial_lpmf(0 | []) = 1
poisson_binomial_lpmf(0 | [0.3]') = 0.7
poisson_binomial_lpmf(1 | [0.3]') = 0.3
poisson_binomial_lpmf(0 | [0.3, 0.6]') = 0.28
poisson_binomial_lpmf(1 | [0.3, 0.6]') = 0.54
poisson_binomial_lpmf(2 | [0.3, 0.6]') = 0.18
poisson_binomial_lpmf(0 | 0.5) = 0.000976562
poisson_binomial_lpmf(1 | 0.5) = 0.00976563
poisson_binomial_lpmf(2 | 0.5) = 0.0439453
poisson_binomial_lpmf(3 | 0.5) = 0.117188
poisson_binomial_lpmf(4 | 0.5) = 0.205078
poisson_binomial_lpmf(5 | 0.5) = 0.246094
poisson_binomial_lpmf(6 | 0.5) = 0.205078
poisson_binomial_lpmf(7 | 0.5) = 0.117188
poisson_binomial_lpmf(8 | 0.5) = 0.0439453
poisson_binomial_lpmf(9 | 0.5) = 0.00976563
poisson_binomial_lpmf(10 | 0.5) = 0.000976562

So it looks OK. I didn’t test the error handling.

If you have a lot of y with the same theta, it would help to just build this whole array once with the max of the y as the bound. If that’s too opaque, I can help code that, too. This should be efficient, even for large N.

7 Likes