Hi all,
Putting this out there in case it’s useful to someone else. I’ve implemented the Extended Hypergeometric Distribution (also known as Fisher’s Noncentral Hypergeometric Distribution) in Stan:
Where, for an urn containing N balls, m_1 is the number of red balls, m_2 is the number of white balls, n is the total number of balls drawn, y is the number of red balls drawn, and \alpha is the log odds ratio for drawing a red or a white ball.
I adapted the code from the R code here, and I’m certain it could be made more efficient - it’s not vectorised, and I’m sure there’s better ways of doing some of the operations in a more efficient and idiomatic Stan way. However, it works well enough for my small dataset of 196 data points, fitting somewhere in the range of 8 - 15 seconds depending on the exact model, and I’m not seeing any divergences. Hope it’s useful to someone else, in any case!
real extended_hypergeometric_lpmf(int y, int mA, int mB, int n, real alpha) {
// Calculate support
int L = max(0, n - mB);
int U = min(n, mA);
// define range of possible x values
int length_x = U - L + 1;
array[length_x] int x = linspaced_int_array(length_x, L, U);
// compute pmf
int k = min(max(mA, mB), max(n, mA + mB - n));
vector[k + 1] k_seq;
for(i in 1:(k + 1)) k_seq[i] = i;
vector[k + 1] lsum = lgamma(k_seq);
array[length_x] int x_xplus1;
array[length_x] int x_mAx1;
array[length_x] int x_nx1;
array[length_x] int x_mBnx1;
for(i in 1:length_x) {
x_xplus1[i] = x[i] + 1;
x_mAx1[i] = mA - x[i] + 1;
x_nx1[i] = n - x[i] + 1;
x_mBnx1[i] = mB - n + x[i] + 1;
}
vector[length_x] g = lsum[x_xplus1] + lsum[x_mAx1] + lsum[x_nx1] + lsum[x_mBnx1];
vector[length_x] t = (to_vector(x) * alpha) - g;
vector[length_x] h1 = t - max(t);
vector[length_x] h2 = h1 - log_sum_exp(h1);
return h2[y - L + 1];
}