Difference of gamma distributions failing to start (many times)

I’m continuing the model from Using integration to fit the difference of gamma distributed random variable here, because this is not about integration anymore. I’m trying to follow @nhuurre’s suggestion to fit the difference between random variables with a gamma distribution, and I was trying to do SBC on that. But I’m having problems to just fit the model:

This is the model:

functions {
   real gammadiffs_lpdf(real y, real alpha1, real alpha2, real beta, real g2) {
    real g1 = g2 + y; // Y = g1 - g2
    return  gamma_lpdf(g1| alpha1, beta) + gamma_lpdf(g2| alpha2, beta);
  }
}
data {
  int<lower = 0> J;
}
transformed data {
  real<lower = 0> alpha_1_ = lognormal_rng(log(4), .1);
  real<lower = 0> alpha_2_ = lognormal_rng(log(.5), .1);
  real<lower = 0> beta_ = lognormal_rng(log(.5), .1);
  real y[J];
  for (j in 1:J){
    real X1 = 0;
    real X2 = 0;
      X1 = gamma_rng(alpha_1_, beta_);
      X2 = gamma_rng(alpha_2_, beta_);
      y[j] = X1 - X2;
  }
}
parameters {
  real<lower=0> alpha_1;
  real<lower=0> alpha_2;
  real<lower=0> beta;
  vector<lower=0>[J] g2;
}
transformed parameters {
}
model {
  target +=  lognormal_lpdf(alpha_1 | log(4), .1);
  target +=  lognormal_lpdf(alpha_2 | log(.5), .1);
  target +=  lognormal_lpdf(beta | log(.5), .1);
  for(j in 1:J)
    target += gammadiffs_lpdf(y[j]| alpha_1, alpha_2, beta, g2[j]);
}
generated quantities {
  real y_[J] = y;
  vector[3] pars_;
  int ranks_[3] = {alpha_1 > alpha_1_,
                   alpha_2 > alpha_2_,
                   beta > beta_};
  vector[J] log_lik;
  pars_[1] = alpha_1_;
  pars_[2] = alpha_2_;
  pars_[3] = beta_;
  for(j in 1:J)
    log_lik[j] = gammadiffs_lpdf(y[j]| alpha_1, alpha_2, beta, g2[j]);
}

I’m having problems with the initial values, 3 out of 10 fits they get rejected with

Chain 1: Rejecting initial value:
Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 1:   Stan can't start sampling from this initial value.

But sometimes they do work.

What might be going on here? I think I didn’t forget any lower, did I?
This is not even the model that I want to fit, this was the very simple one that was supposed to work :/

I think the problem is sometimes X2 is larger than X1 and y comes out as negative.
Negative y requires a nonzero lower bound on g2 because g1 must be positive.

The latest CmdStanR has vectorized lower bounds:

transformed data {
  ...
  vector[J] g2_min;
  for (i in 1:J) {
    g2_min[i] = (y[i] > 0.0) ? 0.0 : -y[i] ;
  }
}
parameters {
  ...
  vector<lower=g2_min>[J] g2;
}

I don’t get that, if y < 0, then g1 < g2, otherwise g2>g1, but they can still be both positive

Yes, but you must surely have g_2 \geq |y|.

1 Like

ohh, right!

A quick follow up, your suggestion works and I did SBC, and it looks good:

Now say that I want to fit only the truncated distribution of difference of gammas, so that y > 0.

transformed data {
  real<lower = 0> alpha_1_ = lognormal_rng(log(4), .1);
  real<lower = 0> alpha_2_ = lognormal_rng(log(.5), .1);
  real<lower = 0> beta_ = lognormal_rng(log(.5), .1);
  vector[J] y;
  vector[J] g2_min;

  for (j in 1:J){
    real X1 = 0;
    real X2 = 0;
   while (X1 - X2 <= 0){
      X1 = gamma_rng(alpha_1_, beta_);
      X2 = gamma_rng(alpha_2_, beta_);
      y[j] = X1 - X2;
    }
  }
}

How should I change the likelihood? I thought it will work fine without any changes, because I would need the complementary of the cdf of the gamma difference evaluated at 0, and this is a constant, and it shouldn’t affect the fitting of the parameters. Or am I wrong here? (I’m must be wrong here).
But when I use the the same likelihood, I get the following SBC:

And this is obviously bad. alpha_1 and beta get overestimated, and alpha_2 gets underestimated. What am I missing here?

It’s not constant; it depends on the parameters alpha_1, alpha_2 and beta. Unfortunately the probability of a negative y is quite complex function of the parameters.

Ohh of course!

In that case, I do need a cdf, right? There is no way around…

Ok, I’ll see if I can figure out a way to get a CDF with the saddle point approximation then.