Confluent hypergeometric functions to estimate population size and detection probability from spatially replicated counts

I am modeling spatial population dynamics which consist of two layers. The inner layer is the population abundance of the species which is modeled using an a spatial population dynamics model. The outer layer is the observation process.

Here is the brief description of the model

Assuming count data y_{x,t,j}\sim \text{Binomial}(n_{x,t,j}, \theta), were x is the patch location, t is time, j is the replicate sampling ID, and \theta is the detection probability. We assume \theta \sim \text{Beta}(\alpha , \beta) and n_{x,t,j} \sim \text{Poisson} (\lambda _{x,t}), where \lambda _{x,t} comes from the population dynamics model e.g. \lambda _{x,t+1} = \int^{\infty}_{- \infty} k(x-y) f(\lambda _{y,t}) dy.

Royle (2004) suggests that if we have replicate counts, we can estimate p, n_{x,t,j}, and \lambda _{x,t}. However, this problem cannot be coded directly in Stan because n_{x,t,j} is a discrete parameter and, therefore, has to be marginalized out. Mathematically, we can marginalize \theta and n_{x,t,j} in the following manner

p(Y= y_{x,t,j} | \alpha, \beta, \lambda_{x,t}) = \sum_{n_{x,t,j} =0}^{\infty} \int_{0}^{1} \underbrace{p(y_{x,t,j} | \theta, n_{x,t,j} )}_\text{Binomial dist.} \underbrace{p(\theta | \alpha, \beta)}_\text{Beta dist.} \underbrace{p( n_{x,t,j}| \lambda_{x,t} )}_\text{Poisson dist.} d\theta.

This gives us
p(Y= y_{x,t,j} | \alpha, \beta, \lambda_{x,t}) = \text{Poisson}(y_{x,t,j} |\lambda_{x,t}) \frac{\text{Beta}(\alpha + y_{x,t,j}, \beta)}{\text{Beta}(\alpha, \beta)} \ce{_{1}F_{1}} (\beta, \alpha + \beta + y_{x,t,j}, \lambda_{x,t}),

where \ce{_{1}F_{1}} is the confluent hypergeometric function of the first kind. We have successfully marginalized the \theta and n_{x,t,j}, which can be recovered in the generated quantities block. Additionally, there might be computational advantages in terms of both accuracy and speed (not tested yet).

I am stuck trying to evaluate \ce{_{1}F_{1}}. Currently Stan does not support confluent hypergeometric functions. There are several infinite summation and integral methods to estimate \ce{_{1}F_{1}} but they may not be stable and scalable. Is the Stan development team working on implementing a scalable and reliable method/function to calculate confluent hypergeometric functions anytime soon?

I believe that the technique I presented in this post could be very useful in patch occupancy modeling.

1 Like

Hypergeometric functions are in Stan math and should be exposed in the Stan language with the next release. See math/hypergeometric_pFq.hpp at develop · stan-dev/math · GitHub. @andrjohns implemented this and will know more.

The interesting parts are in the derivative s that he implemented at math/grad_pFq.hpp at develop · stan-dev/math · GitHub.

2 Likes

@GuidoAMoreira and I have some Stan code to do adaptive truncation, described in detail in our preprint. @jsocolar, this might be the push we need to get this implemented for occupancy models, uh? Happy to collaborate!

1 Like

I’m currently working on the confluent 1F1 and its gradients over on this math branch (since it’s the foundation of a bunch of numerical stability improvements for the log-CDFs).

So this should be available for you in the next release

4 Likes

I had a similar problem a couple of months ago:
https://discourse.mc-stan.org/t/marginalizing-a-double-binomial-poisson-hierarchical-distribution/27801
I had just two replicate counts, although it sounds as though you may have more. I didn’t marginalize out the detection probabilities, which may affect whether my approach is relevant to you. I initially looked at ways of evaluating the confluent hypergeometric function, including the adaptive truncation suggested by @maxbiostat, but in the end found that there was a relatively easy exact solution if you work with the probability generating functions. There is Stan code at the end of the post in the link above (I may have made a few minor changes since then, but that version was working). I recommend first having a look at section 36.8 in Johnson, Kotz and Balakrishnan, Discrete Multivariate Distributions, which is quite general, and then Hamdan and Tsokos (1971), which is less general but a bit easier to follow.

2 Likes

Hi @andrjohns, was 1F1 release in the last version of stan? I could not find it in the function manual.

2 Likes

I wrote a Stan function to calculate 1F1 on log scale based on this paper. Posting it here in case it may be useful to someone. This may become obsolete once pFq are implemented in Stan.

  real Log_1F1(real a, real b, real z) {
    int j = 0;
    real toll = 1e-9;
    real log_Aj = 0;
    real log_Sj = log_Aj;
    
    // z > 0 ? Log_1F1(a, b, z) : z + Log_1F1(b-a, b, -z) (for numericall accuracy)
    
    if (z >= 0){
      while (log_Aj-log_Sj > log(toll)){
        j += 1;
        log_Aj += log(((a+j-1)/(b+j-1))*(z/j));
        log_Sj = log_sum_exp([log_Aj, log_Sj]);}
      return log_Sj;
    } else {
        while (log_Aj-log_Sj > log(toll)){
          j += 1;
          log_Aj += log(((b-a+j-1)/(b+j-1))*(-z/j));
          log_Sj = log_sum_exp([log_Aj, log_Sj]);}
        return z + log_Sj;
    }
  }

So to this date there is no native function right? I could not find it ether and would like to test a model which needs it.

@luchinoprince I dont think there is one but it seems to be under works. I also implemented pFq function.
Here is the code that I wrote. I have only tested this for a_p, b_q and Re(z) > 0

  real log_pFq(vector a_p, vector b_q, real z) {
    int j = 0;
    real toll = log(machine_precision());
    real log_Aj = 0;
    real log_Sj = log_Aj;
    real log_z = log(z);
    
    while (log_Aj-log_Sj > toll && j< 1e6){
      j += 1;
      log_Aj += sum(log(a_p+j-1))-sum(log(b_q+j-1))+log_z-log(j);
      log_Sj = log_sum_exp([log_Aj, log_Sj]);}
    return log_Sj;
  }

@nikunj410 thanks for the answer, I saw you implementation. I am just concerned about the speed convergence of such an implementation and its potential instability. Also not sure how the gradient would work if the pdf is an iterative function. In my case I have a tensor where the entries would be a density which uses pFq, hence I would have to calculate a lot of those at every HMC step, which I guess would be too costly. Maybe parallelising within chain could help, I will try to test it,

@luchinoprince I am pretty sure that this is not a stable implementation when a,b,z <0. For, example, when I tried implementing 1F1 for z<0, I was getting convergence issues. For positive coefficient values, I was able to achieve similar numerical accuracy as the pFq function in Julia (because I was working on log scale).

You are right the speed is a major limiting factor. Since there are no analytical gradients, we have to rely on auto diff which is usually slower. But the work on gradients is stills in progress here.

Maybe you want to try and implement the code I shared to see if your Stan code works with synthetic data and when the native implementation of pFq is made available in Stan, you can make appropriate changes.

hey @nikunj410 , I am sure that in my case a,b would be greater than 0, while z could have ether sign(not sure). I will see if I can find other distributions which allow a better treatment, as the main issue with approximate likelihood from my perspective is that it is hard to justify/specify from what distribution we are then sampling from.

Best,
Luca