Possible Issue with gamma_lccdf

s.maskell · September 22, 2025, 4:33pm

We’re having problems with fitting a Gamma distribution to right censored data. Similar Stan files appear to work if we use a log-Normal in place of the Gamma distribution and/or we only use left censored data. However, if we try to estimate the parameters of a Gamma from right censored data, we get peculiar results. Using BridgeStan with our own Metropolis-Hastings sampler also appears to work, implying to us that the issue is something to do with the combination of gamma_lccdf and using gradients.

There is a related issue (here: Numerically Stable gamma_lcdf Gradients by andrjohns · Pull Request #2767 · stan-dev/math) which fixed something related to gamma_lcdf (as needed for left censored data) but does not appear to have considered gamma_lccdf (as needed for right censored data).

I’m keen to know how others would seek to pin down what is happening and/or fix it. Any suggestions would be welcome.

caesoma · September 29, 2025, 3:55pm

Could you manually compute the complementary distribution from the cdf, or write a custom log gamma ccdf to test whether is the specific implementation of gamma_lccdf or an intrinsic mathematical hurdle of the distribution and HMC?

s.maskell · September 29, 2025, 7:45pm

Thanks. I think implementing our own log ccdf using Stan’s primitives is a good idea. It seems the same idea was useful for the cdf (here: Minimal censored Gamma with VERY bad performance). My guess is that we need a mycdf(.) that calls gamma_q(.).

caesoma · September 30, 2025, 9:53am

I never used it myself, but I’m assuming target += 1 - gamma_lcdf(y | alpha, beta) will work straightforwardly for reals, and a simple wrapper function can vectorize it as needed. That plus the fully customized lccdf written (nearly) from scratch will give you two different options to find the potential issue.

LucC · September 30, 2025, 12:20pm

You probably mean to use the log1m_exp() function, which first exponentiates the log-cumulative probability (from your_lcdf function), takes the complement, and then returns the log of that in a numerically stable fashion:

target += log1m_exp(gamma_lcdf(y | alpha, beta))

caesoma · September 30, 2025, 1:41pm

You’re right. I intended it to be the log of one minus the complement of the CDF, not one minus its log [log(1-F(y)) as opposed to 1-log(F(y))]. But since we want the log-probability, I guess it would be log1m(gamma_cdf(y | alpha, beta)), no need to exponentiate it back. As I said, I haven’t used this myself, so it’s good practice to that the quantity is correct, but the general idea should be correct.

spinkney · September 30, 2025, 1:50pm

I created a PR that fixes this is at fix-gamma-lccdf by spinkney · Pull Request #3241 · stan-dev/math · GitHub

mjcarter · November 13, 2025, 10:44am

I compiled cmdStan with the fix-gamma-lccdf branch of Stan math but am still running into issues when sampling from a Gamma distribution with left-censored data. I consider the following models:

censored_stan_lccdf.stan

data {
  int<lower=0> N_obs;
  int<lower=0> N_cens;
  vector<lower=0>[N_obs] obs;
  vector<lower=0>[N_cens] cens;
}
parameters {
  real log_alpha_raw;
  real log_beta_raw;
}
transformed parameters {
  real<lower=0> alpha = exp(log(30) + log_alpha_raw);
  real<lower=0> beta  = exp(log(1.0 / 3.0) + log_beta_raw);
}
model {
  target += normal_lpdf(log_alpha_raw | 0, 1);
  target += normal_lpdf(log_beta_raw  | 0, 1);

  target += gamma_lpdf(obs | alpha, beta);
  target += gamma_lccdf(cens | alpha, beta);
}

censored_manual_lccdf.stan

functions {
  real gamma_lccdf_manual(real y, real alpha, real beta) {
    if (y <= 0) return 0;
    return log1m_exp(gamma_lcdf(y | alpha, beta));
  }
  real gamma_lccdf_manual(vector y, real alpha, real beta) {
    real s = 0;
    for (n in 1:num_elements(y)) {
      s += gamma_lccdf_manual(y[n], alpha, beta);
    }
    return s;
  }
}
data {
  int<lower=0> N_obs;
  int<lower=0> N_cens;
  vector<lower=0>[N_obs] obs;
  vector<lower=0>[N_cens] cens;
}
parameters {
  real log_alpha_raw;
  real log_beta_raw;
}
transformed parameters {
  real<lower=0> alpha = exp(log(30) + log_alpha_raw);
  real<lower=0> beta  = exp(log(1.0 / 3.0) + log_beta_raw);
}
model {
  target += normal_lpdf(log_alpha_raw | 0, 1);
  target += normal_lpdf(log_beta_raw  | 0, 1);

  target += gamma_lpdf(obs | alpha, beta);
  target += gamma_lccdf_manual(cens, alpha, beta);
}

and simulate data from a Gamma distribution with $$\alpha=30.0$$ and $$\beta=1/3$$ with a censoring threshold of $$T_{censor} = 100.0$$

rng = np.random.default_rng(42)
samples = rng.gamma(shape=30.0, scale=3.0, size=2_000)
is_obs  = samples <= 100.0
obs     = samples[is_obs].astype(float)
cens    = np.full(np.sum(~is_obs), 100.0)
data = {
    "N_obs":  int(obs.size),
    "N_cens": int(cens.size),
    "obs":    obs,
    "cens":   cens,
}

I then sample from each model using the default NUTS hyperparameters and the same initial seed. As you can see below, only censored_manual_lccdf.stan generates samples close to the true values of alpha and beta. The model that uses gamma_lccdf from Stan reaches max tree depth 90% of the time.

I have run this over multiple seeds (for Stan and data generation) and repeatedly see the same behaviour. Implementing a basic MH sampler in Python and using BridgeStan to call log_probyields decent results for both models.

To try and isolate the issue, I created two separate models that are parameterised as:

data {
  real<lower=0> alpha;
  real<lower=0> beta;
}
parameters {
  real<lower=0> y;
}
model {
 ...
}

And use BridgeStan to evaluate log_prob and log_prob_grad for each model across y=0 to y=200, but the values agree with one another, so I’m a bit stumped where the discrepancy is coming from. Any suggestions on how to isolate this?

spinkney · November 13, 2025, 1:05pm

That is weird! Can you test the following three things?

Can you confirm that cmdstan is picking up the correct branch of Stan math when using cmdstanpy?
Can you expose the functions and test in Python to find specific values that fail? I think in Python you can use GitHub - WardBrian/pybind_expose_stan_fns: Exposing Stan functions in Python. You can then test with the current release vs the branch fix to see if the same values fail. Or if some range is fixed but there are still values in the fix thatl fail.
Test the fix-gamma-lccdf branch again but loop over gamma_lccdf elementwise instead of the vectorized version. Let’s see if the vectorization is the issue.

mjcarter · November 21, 2025, 7:34pm

Some responses below:

I believe that Stan math is pointing to the correct branch:

(base) mjc@dacdt-condor-gpu2:~/Documents/cmdstan/stan/lib/stan_math$ git status
On branch develop
Your branch is up-to-date with 'origin/develop'.

nothing to commit, working tree clean
(base) mjc@dacdt-condor-gpu2:~/Documents/cmdstan/stan/lib/stan_math$ git branch -v
* develop b82d68ced2 Merge pull request #3251 from stan-dev/dependabot/github_actions/actions/upload-artifact-5
(base) mjc@dacdt-condor-gpu2:~/Documents/cmdstan/stan/lib/stan_math$ git log --oneline
b82d68ced2 (HEAD -> develop, origin/develop, origin/HEAD) Merge pull request #3251 from stan-dev/dependabot/github_actions/actions/upload-artifact-5
15c59c085f Bump actions/upload-artifact from 4 to 5
5e698970d5 Merge pull request #3248 from stan-dev/tweak-positive-index-wording
53973441c7 Update wording of validate_positive_index
b9167895db Merge pull request #3246 from stan-dev/fix/cleanup-rev-tests
c89ca16f78 Merge pull request #3245 from stan-dev/fix/inline
28c7048a5e [Jenkins] auto-formatting by clang-format version 10.0.0-4ubuntu1
1ca4162bfa fix cpplint
40920f5427 [Jenkins] auto-formatting by clang-format version 10.0.0-4ubuntu1
29ea240604 Merge commit '407dcfeb7bcdc5cbea1ff427e4945161fc47a276' into HEAD
bd4af2fe0f use AGradRev so that all tests clean themselves up in the rev testing folder
407dcfeb7b Merge pull request #3236 from lingium/feature/issue-3235-yule-simon-lccdf
021bac4cc0 fix cpplint
a8a0014e32 [Jenkins] auto-formatting by clang-format version 10.0.0-4ubuntu1
d4c03d88b6 fix cpplint
fd75aaf64a fix double inline keywords
340e343d8c Remove test of deprecated forward_as
a099940a59 Merge pull request #3244 from stan-dev/more-metaprograms
94dda8f755 Update stan/math/prim/prob/yule_simon_lccdf.hpp
2f4fe90141 Merge pull request #3242 from stan-dev/cleanup/remove-forward_as
bf46263fc2 [Jenkins] auto-formatting by clang-format version 10.0.0-4ubuntu1
9d1072913e Merge commit '1e1dfc91e8264116565d8f944c016c68ecb4cfb7' into HEAD
c5c4eb1d0d add inline to all functions so that the definition of functions is used only in on translation unit
1e1dfc91e8 Merge pull request #3241 from stan-dev/fix-gamma-lccdf
(base) mjc@dacdt-condor-gpu2:~/Documents/cmdstan/stan/lib/stan_math$

I point cmdStanPy to this version of cmdStan:

set_cmdstan_path("/home/mjc/Documents/cmdstan")
print(f"CmdStan version: {cmdstan_version()}")
print("Using CmdStan at:", cmdstan_path())

in terminal I see:

CmdStan version: (2, 37)
Using CmdStan at: /home/mjc/Documents/cmdstan

I’ll have a go at this and follow up.
I revised the lccdf function to use:

for (n in 1:num_elements(y)) {
            acc += gamma_lccdf(y[n] | alpha, beta);
}

but still see the same issues

spinkney · November 22, 2025, 10:52pm

I’ve confirmed that it does result in numerical instability for alpha approximately greater than 30. It’s due to gamma_q or grad_reg_inc_gamma. I can rewrite this to use gamma_p and grad_reg_lower_inc_gamma, which avoids potential instability in a tgamma(alpha) and digamma(alpha) call. It seems to sample a bit faster than just wrapping gamma_lcdf with log1m_exp. I’ll put a pr in the next week or so with Thanksgiving.

spinkney · December 18, 2025, 2:31pm

You can follow the PR at fix gamma lccdf for alpha > 30 and overflow by spinkney · Pull Request #3264 · stan-dev/math · GitHub

spinkney · December 25, 2025, 3:04pm

@mjcarter @s.maskell test this pr super stable gamma_lccdf by spinkney · Pull Request #3266 · stan-dev/math · GitHub

I tested in various regimes and found it to be faster and handle extreme values quite well. I cannot guarantee all values with finite precision computation but I believe it should work over an extended range relative to the log1m_exp version.

Topic		Replies	Views
Minimal censored Gamma with VERY bad performance General fitting-issues	27	1281	July 8, 2022
Gamma distribution quantile and inverse quantile function General rstan , techniques , fitting-issues , specification , math	46	5621	June 18, 2021
Gamma regression in Stan vs Frequentist approach Modeling specification	11	3356	March 26, 2024
Gamma(.., ..)^2 -> generalized_gamma(?, ?, ?) Modeling fitting-issues	23	4677	February 7, 2018
Divergent transition fitting truncated gamma Modeling	10	1619	January 28, 2020

Possible Issue with gamma_lccdf

Related topics