Difference of gamma distributions failing to start (many times)

bnicenboim · October 27, 2020, 3:18pm

I’m continuing the model from Using integration to fit the difference of gamma distributed random variable here, because this is not about integration anymore. I’m trying to follow @nhuurre’s suggestion to fit the difference between random variables with a gamma distribution, and I was trying to do SBC on that. But I’m having problems to just fit the model:

This is the model:

functions {
   real gammadiffs_lpdf(real y, real alpha1, real alpha2, real beta, real g2) {
    real g1 = g2 + y; // Y = g1 - g2
    return  gamma_lpdf(g1| alpha1, beta) + gamma_lpdf(g2| alpha2, beta);
  }
}
data {
  int<lower = 0> J;
}
transformed data {
  real<lower = 0> alpha_1_ = lognormal_rng(log(4), .1);
  real<lower = 0> alpha_2_ = lognormal_rng(log(.5), .1);
  real<lower = 0> beta_ = lognormal_rng(log(.5), .1);
  real y[J];
  for (j in 1:J){
    real X1 = 0;
    real X2 = 0;
      X1 = gamma_rng(alpha_1_, beta_);
      X2 = gamma_rng(alpha_2_, beta_);
      y[j] = X1 - X2;
  }
}
parameters {
  real<lower=0> alpha_1;
  real<lower=0> alpha_2;
  real<lower=0> beta;
  vector<lower=0>[J] g2;
}
transformed parameters {
}
model {
  target +=  lognormal_lpdf(alpha_1 | log(4), .1);
  target +=  lognormal_lpdf(alpha_2 | log(.5), .1);
  target +=  lognormal_lpdf(beta | log(.5), .1);
  for(j in 1:J)
    target += gammadiffs_lpdf(y[j]| alpha_1, alpha_2, beta, g2[j]);
}
generated quantities {
  real y_[J] = y;
  vector[3] pars_;
  int ranks_[3] = {alpha_1 > alpha_1_,
                   alpha_2 > alpha_2_,
                   beta > beta_};
  vector[J] log_lik;
  pars_[1] = alpha_1_;
  pars_[2] = alpha_2_;
  pars_[3] = beta_;
  for(j in 1:J)
    log_lik[j] = gammadiffs_lpdf(y[j]| alpha_1, alpha_2, beta, g2[j]);
}

I’m having problems with the initial values, 3 out of 10 fits they get rejected with

Chain 1: Rejecting initial value:
Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 1:   Stan can't start sampling from this initial value.

But sometimes they do work.

What might be going on here? I think I didn’t forget any lower, did I?
This is not even the model that I want to fit, this was the very simple one that was supposed to work :/

nhuurre · October 27, 2020, 4:48pm

I think the problem is sometimes X2 is larger than X1 and y comes out as negative.
Negative y requires a nonzero lower bound on g2 because g1 must be positive.

The latest CmdStanR has vectorized lower bounds:

transformed data {
  ...
  vector[J] g2_min;
  for (i in 1:J) {
    g2_min[i] = (y[i] > 0.0) ? 0.0 : -y[i] ;
  }
}
parameters {
  ...
  vector<lower=g2_min>[J] g2;
}

bnicenboim · October 27, 2020, 8:07pm

I don’t get that, if y < 0, then g1 < g2, otherwise g2>g1, but they can still be both positive

nhuurre · October 27, 2020, 8:09pm

Yes, but you must surely have g_2 \geq |y|.

bnicenboim · October 27, 2020, 8:10pm

ohh, right!

bnicenboim · October 28, 2020, 2:16pm

A quick follow up, your suggestion works and I did SBC, and it looks good:

Now say that I want to fit only the truncated distribution of difference of gammas, so that y > 0.

transformed data {
  real<lower = 0> alpha_1_ = lognormal_rng(log(4), .1);
  real<lower = 0> alpha_2_ = lognormal_rng(log(.5), .1);
  real<lower = 0> beta_ = lognormal_rng(log(.5), .1);
  vector[J] y;
  vector[J] g2_min;

  for (j in 1:J){
    real X1 = 0;
    real X2 = 0;
   while (X1 - X2 <= 0){
      X1 = gamma_rng(alpha_1_, beta_);
      X2 = gamma_rng(alpha_2_, beta_);
      y[j] = X1 - X2;
    }
  }
}

How should I change the likelihood? I thought it will work fine without any changes, because I would need the complementary of the cdf of the gamma difference evaluated at 0, and this is a constant, and it shouldn’t affect the fitting of the parameters. Or am I wrong here? (I’m must be wrong here).
But when I use the the same likelihood, I get the following SBC:

And this is obviously bad. alpha_1 and beta get overestimated, and alpha_2 gets underestimated. What am I missing here?

nhuurre · October 28, 2020, 3:00pm

It’s not constant; it depends on the parameters alpha_1, alpha_2 and beta. Unfortunately the probability of a negative y is quite complex function of the parameters.

bnicenboim · October 28, 2020, 4:04pm

Ohh of course!

In that case, I do need a cdf, right? There is no way around…

Ok, I’ll see if I can figure out a way to get a CDF with the saddle point approximation then.

Topic		Replies	Views
Using integration to fit the difference of gamma distributed random variable Modeling fitting-issues , integrator , user-defined-functions	15	1574	July 4, 2020
Minimal censored Gamma with VERY bad performance General fitting-issues	27	1287	July 8, 2022
Possible Issue with gamma_lccdf Developers	12	446	December 25, 2025
Divergent transition fitting truncated gamma Modeling	10	1627	January 28, 2020
Troubles in "rejecting initial value" Modeling	7	3411	November 21, 2017

Difference of gamma distributions failing to start (many times)

Related topics