Using integration to fit the difference of gamma distributed random variable

bnicenboim · June 18, 2020, 1:56pm

Hi,

I’m trying to fit data that represents the difference between latent random variable that are gamma distributed (with the same rate). I’m obviously not succeeding.

I generate my data like this in r. Basically I want to fit Y, assuming I know alpha2, and beta.

rgammadiff <- function(N, alpha1=1, alpha2=1, beta=1){
  rgamma(N, alpha1, beta= beta) - rgamma(N, alpha2, beta = beta)
}

N <- 10
alpha1 <- 500
alpha2 <- 10
beta <- 1
Y<- rgammadifft(N = N, alpha1, alpha2, beta)

I’m following equation (4) from Klar (2014), where the density at z is:

f(z) = c \cdot e^{\beta z} \int_z^\infty x^{\alpha_1-1} (x-z)^{\alpha_2-1} e^{-2\beta x} dx if z >0

where c = \beta^{\alpha_1+\alpha_2}/ (\Gamma(\alpha_1) \Gamma(\alpha2))

I’m ignoring z<0 because in my real data, z is always positive (I guess I need to fit a truncated difference of gammas eventually, but I’m postponing that problem).

And my code in stan is as follows:

functions {
  real pos(real x,          // Function argument
           real xc,         // Complement of function argument
           real[] theta,    // parameters
           real[] x_r,      // data (real)
           int[] x_i) {     // data (integer)
    real alpha1 = theta[1];
    real alpha2 = theta[2];
    real beta = theta[3];
    real z = x_r[1];
    return  x^(alpha1 - 1) * (x - z)^(alpha2 - 1)*exp(-2 * beta * x);
}
  real gammadifft_lpdf(data real y, real alpha1, real alpha2, real beta) {
    int x_i[0];
    real logc = log(beta)*(alpha1 + alpha2) - lgamma(alpha1) - lgamma(alpha2);
    real logint = log(integrate_1d(pos,
                                   y,
                                   positive_infinity(),
                                   {alpha1, alpha2, beta},
                                   {y}, x_i));
    return logc + beta * y + logint ;
  }
}
data {
  int N;
  vector[N] Y;
  real alpha2;
  real beta;
}
parameters {
  real<lower=0> alpha1;
}
model {
  target += lognormal_lpdf(alpha1 | log(500),1);
  for(i in 1:N)
    target += gammadifft_lpdf(Y[i] | alpha1, alpha2, beta);
}

I’m just trying to make it work, so as I said I assume I know everything except alpha1:

lsdata <- list(Y=Y, alpha2 = alpha2, beta = beta, N = length(Y) )
library(cmdstanr)
cmdstan_version()

cmd <- cmdstan_model("gammadifft_lpdf.stan")
fit_cmd <- cmd$sample(
                 data = lsdata,
                 seed = 123,
                 chains = 1,
                 cores = 4)

And I get the following error Exception: Error in function boost::math::quadrature::exp_sinh<double>::integrate: The exp_sinh quadrature evaluated your function at a singular point and returned -nan. Please ensure your function evaluates to a finite number over its entire domain. (in '/tmp/RtmpzVP2ah/model-4a7f71da3259.stan', line 13, column 4 to column 57).
(Line 13 is the return of function pos.)

Is it even possible to do this in Stan? The paper of Klar (2014) also mentions an alternative formula by means of Whittaker’s W function, a confluent hypergeometric function (Olver et al., 2010, 13.14.3). Would it make more sense to use this function (somehow from some library from C++)?

(By the way, the R function dcoga2dim (package coga) uses the C++ function gsl_sf_hyperg_1F1 to calculate the convolution of two gammas, but I couldn’t find enough documentation to understand it.)

Thanks!!
Bruno

bbbales2 · June 18, 2020, 2:16pm

The exp_sinh quadrature evaluated your function at a singular point and returned -nan.

This error means that something is blowing up in the function.

Looking at the function, it seems like this could happen if x = 0, \alpha_1 < 1 or x - z = 0, \alpha_2 < 1. Any chance either of those conditions are true?

Also if the true value of \alpha_1 is 500, it’s gonna be really difficult for the quadrature to integrate \int_z^\infty x^{499} e^{-2 x} dx. I don’t think you’re gonna be able to do that in double precision.

bnicenboim · June 18, 2020, 2:42pm

Thanks for the fast answer!!

In this simulation, both alphas are not close to 1, they are 500 and 10. But regarding x, that is being integrated, from z, the data, until infinity. So x is never zero, since I’m fitting values that look like this:
494.9901 445.3105 506.1266 492.5745 446.5357 463.2748 454.8301 464.4533 528.8234 503.9851
but x-z is zero at the beginning of the integration, I don’t think I can avoid that).

Oh, ok, but it is going to be a value between 100-600 in my real application. So what should I do? I can’t do it in double precision, but I can with some approximation? or something like that?

(And do you have any idea about the Whittaker’s W to avoid integration?)

nhuurre · June 18, 2020, 2:50pm

You might be able to get away with

data {
  int N;
  vector[N] Y;
  real alpha2;
  real beta;
}
parameters {
  real<lower=0> alpha1;
  vector<lower=0>[N] g2;
}
transformed parameters {
  vector[N] g1 = g2 + Y; // Y = g1 - g2
}
model {
  target += lognormal_lpdf(alpha1 | log(500),1);
  target += gamma_lpdf(g1| alpha1, beta);
  target += gamma_lpdf(g2| alpha2, beta);
}

Assuming you have to integrate, doing as much as possible on log scale helps stability.

real pos(real x,          // Function argument
         real xc,         // Complement of function argument
         real[] theta,    // parameters
         real[] x_r,      // data (real)
         int[] x_i) {     // data (integer)
  real alpha1 = theta[1];
  real alpha2 = theta[2];
  real beta = theta[3];
  real z = x_r[1];
  return  exp((alpha1 - 1)*log(x) + (alpha2 - 1)*log(x - z) - 2 * beta * x);
}

bbbales2 · June 18, 2020, 3:00pm

return  exp((alpha1 - 1)*log(x) + (alpha2 - 1)*log(x - z) - 2 * beta * x);

Lol I was quite stumped on what to do here. This is so simple.

bnicenboim · June 18, 2020, 5:59pm

Regarding the first case with no integration, I don’t need a Jacobian adjustment because I’m summing a constant, right?
I was worried of doing something like that, because I’ll end up with 1000 g1 and g2 in my real application. But maybe doing integrals is more costly than the 2000 latent variables…

The second case still didn’t work, I get the following:

Chain 1 Rejecting initial value:
Chain 1   Log probability evaluates to log(0), i.e. negative infinity.
Chain 1   Stan can't start sampling from this initial value.
Chain 1 Initialization failed.
Warning: Chain 1 finished unexpectedly!

Any ideas?

nhuurre · June 18, 2020, 6:15pm

Yes, the Jacobian is constant.

So at least the integrator thinks it can solve the integral. Hmm. Maybe it still returns zero? You say the true alpha1 is ~500. The model has no idea where to to start and tries to initialize near alpha1=1. That’s quite a bad guess, near-zero probability is expected. You could reparametrize so it initializes closer to plausible values

parameters {
  real<lower=0> alpha1_raw;
}
transformed parameters {
  real alpha1 = 500*alpha1_raw;
}

bnicenboim · June 18, 2020, 9:11pm

why near alpha1=1? Give the prior that I had originally, where most of the prob mass was near 500.

In any case I tried reparametrizing alpha1 but I still get Chain 1 Exception: Exception: Error in function boost::math::quadrature::exp_sinh<double>::integrate: The exp_sinh quadrature evaluated your function at a singular point and returned inf. Please ensure your function evaluates to a finite number over its entire domain.

(I also changed the prior accordingly). The model now looks like this:

functions {
  real pos(real x,          // Function argument
           real xc,         // Complement of function argument
           real[] theta,    // parameters
           real[] x_r,      // data (real)
           int[] x_i) {     // data (integer)
    real alpha1 = theta[1];
    real alpha2 = theta[2];
    real beta = theta[3];
    real z = x_r[1];
    /* return  x^(alpha1 - 1) * (x - z)^(alpha2 - 1)*exp(-2 * beta * x); */
    return  exp((alpha1 - 1)*log(x) + (alpha2 - 1)*log(x - z) - 2 * beta * x);
  }
  real gammadifft_lpdf(data real y, real alpha1, real alpha2, real beta) {
    int x_i[0];
    real logc = log(beta)*(alpha1 +alpha2) - lgamma(alpha1) - lgamma(alpha2);
    real logint = log(integrate_1d(pos,
                                   y,
                                   positive_infinity(),
                                   {alpha1, alpha2, beta},
                                   {y}, x_i));
    return logc + beta * y + logint ;
  }
}
data {
  int N;
  vector[N] Y;
  real alpha2;
  real beta;
}
parameters {
  real<lower=0> alpha1_raw;
}
transformed parameters {
  real alpha1 = 500*alpha1_raw;
}
model {
  target += lognormal_lpdf(alpha1_raw | log(1),1);
  for(i in 1:N)
    target += gammadifft_lpdf(Y[i]| alpha1, alpha2, beta);
}

new ideas?

nhuurre · June 18, 2020, 9:50pm

Yeah, it’s not obvious. The HMC algorithm doesn’t inspect the priors. It just picks a random point consistent with the constraints declared in the parameters block and then follows the gradient of the posterior log probability.

Aha! Instead of underflowing to zero the integrand now overflows to infinity. Progress!
I think it might be possible to handle the integral the same log_sum_exp avoids overflowing: subtract the maximum value.

functions {
  real pos(real x,          // Function argument
           real xc,         // Complement of function argument
           real[] theta,    // parameters
           real[] x_r,      // data (real)
           int[] x_i) {     // data (integer)
    real alpha1 = theta[1];
    real alpha2 = theta[2];
    real beta = theta[3];
    real zeta = theta[4];
    real z = x_r[1];
    return  exp(zeta + (alpha1 - 1)*log(x) + (alpha2 - 1)*log(x - z) - 2 * beta * x);
  }
  real gammadifft_lpdf(data real y, real alpha1, real alpha2, real beta) {
    int x_i[0];
    real logc = log(beta)*(alpha1 +alpha2) - lgamma(alpha1) - lgamma(alpha2);

    // here peak needs to be the x where the integrand is maximal...
    // taking derivative of (alpha1 - 1)*log(x) + (alpha2 - 1)*log(x - z) - 2 * beta * x
    // and solving for zeros probably works?
    real peak = ...;
    real zeta = (alpha1 - 1)*log(peak) + (alpha2 - 1)*log(peak - z) - 2 * beta * peak;

    real logint = log(integrate_1d(pos,
                                   y,
                                   positive_infinity(),
                                   {alpha1, alpha2, beta, -zeta},
                                   {y}, x_i));
    return zeta + logc + beta * y + logint ;
  }
}

Mathematically the value of zeta cancels out. Numerically a well-chosen zeta prevents overflow and underflow errors. I dunno, this is complicated. The could peak be very narrow.

bnicenboim · June 19, 2020, 7:17am

It actually worked!!! Thanks! I also learned a lot!

It takes longer than your first easier solution and with less ESS though. I wonder which will scale better to my actual model. There’s still a third option using some external c++ for WhittakerW that depends on the confluent hypergeometric function U, which I see it exists in gsl, I’ll see if I can manage or open a different thread otherwsie

In any case, complete code of a difference of gamma distributed RV with the same rate for future travelers:

functions {
  real pos(real x,          // Function argument
           real xc,         // Complement of function argument
           real[] theta,    // parameters
           real[] x_r,      // data (real)
           int[] x_i) {     // data (integer)
    real alpha1 = theta[1];
    real alpha2 = theta[2];
    real beta = theta[3];
    real zeta = theta[4];
    real z = x_r[1];
    return  exp(zeta + (alpha1 - 1)*log(x) + (alpha2 - 1)*log(x - z) - 2 * beta * x);
  }
  real gammadifft_lpdf(data real y, real alpha1, real alpha2, real beta) {
    int x_i[0];
    real logc = log(beta)*(alpha1 +alpha2) - lgamma(alpha1) - lgamma(alpha2);

    // here peak needs to be the x where the integrand is maximal...
    // taking derivative of (alpha1 - 1)*log(x) + (alpha2 - 1)*log(x - z) - 2 * beta * x
    // and solving for zeros probably works?
    real peak =(alpha1 + alpha2 + 2*beta*y - 2)/(4*beta) + sqrt(alpha1^2 + 2*alpha1*alpha2 - 4*alpha1*beta*y - 4*alpha1 + alpha2^2 + 4*alpha2*beta*y - 4*alpha2 + 4*beta^2*y^2 + 4)/(4*beta);
    real zeta = (alpha1 - 1)*log(peak) + (alpha2 - 1)*log(peak - y) - 2 * beta * peak;

    real logint = log(integrate_1d(pos,
                                   y,
                                   positive_infinity(),
                                   {alpha1, alpha2, beta, -zeta},
                                   {y}, x_i));
    return zeta + logc + beta * y + logint ;
  }
}
data {
  int N;
  vector[N] Y;
  real alpha2;
  real beta;
}
parameters {
  real<lower=0> alpha1_raw;
}
transformed parameters {
  real alpha1 = 500*alpha1_raw;
}
model {
  target += lognormal_lpdf(alpha1_raw | log(1),1);
  for(i in 1:N)
    target += gammadifft_lpdf(Y[i]| alpha1, alpha2, beta);
}

bnicenboim · June 19, 2020, 11:30am

A quick follow up question about both options. Say that I want this distribution restricted to positive values, that is a truncated in zero difference of gammas.

For the first version that you provided, this shouldn’t matter right? Because if all the Y are positive, the transformed parameter is already saying that g1 > g2, right? Or am I getting something wrong?

For the second version, I would need to derive the CDF, that would mean to integrate my pdf which already includes integration. Would that even be possible?

nhuurre · June 19, 2020, 11:52am

Right.

Yeah, looks really difficult.

I don’t really have experience fitting complicated models but here’s a thought.
If the data is far from zero then the < 0 part may have negligible probability mass anyway. If the posterior predictive distribution is all positive values then that is some assurance that the sampler did not find it necessary to explore areas where the truncation matters.

bnicenboim · July 1, 2020, 2:53pm

Sorry, a new issue related to the gamma difference without integration, how would I turn it into a function?

This (obviously) doesn’t work:

functions {
  real gamma_diff_lpdf(vector Y , real alpha1, real alpha2, real beta){
    int N = num_elements(Y);
    vector<lower=0>[N] g2; 
    vector[N] g1 = g2 + Y; // Y = g1 - g2
    return  gamma_lpdf(g1| alpha1, beta) + gamma_lpdf(g2| alpha2, beta);
  }
}
data {
  int N;
  vector[N] Y;
  real alpha2;
  real beta;
}
parameters {
  real<lower=0> alpha1;
}
model {
  target += lognormal_lpdf(alpha1 | log(500),1);
  target += gamma_diff_lpdf(Y| alpha1, alpha2, beta);
}

is there a way for Stan to treat g2 as a parameter even if it’s inside the function?

I really don’t want to have g2 in my actual model because it would be a ragged array, any way to avoid defining it outside the function?
Thanks!

nhuurre · July 1, 2020, 3:36pm

I don’t think that’s possible.

Bob_Carpenter · July 4, 2020, 12:33am

Definitely not possible. This was the motivation for Maria Gorinova’s (@mgorinova) blockless version of Stan, SlicStan. We may change the language to that some day (while remaining backward compatible), but it’s a long-term project.

bnicenboim · July 4, 2020, 6:36am

ok, good to know anyways!

Topic		Replies	Views
Difference of gamma distributions failing to start (many times) Modeling fitting-issues , simulation-based-calibration	7	516	October 28, 2020
Computing pdf values of a distribution in STAN Modeling	4	849	August 3, 2017
Divergent transition fitting truncated gamma Modeling	10	1545	January 28, 2020
Fake data - _rng where is the error from? Modeling rstan	7	289	April 18, 2024
Generalized Gamma distribution with absolute value on a parameter Modeling fitting-issues	0	658	September 21, 2022

Using integration to fit the difference of gamma distributed random variable

Related topics