Minimal censored Gamma with VERY bad performance

Hi everyone,

I’m just trying to adapt the example code in the Stan manual to fit a censored Gamma model

// simple censored gamma
data {
  int<lower=0> Nseen;
  array[Nseen] real<lower=11> y;
  int<lower=0> Nmiss;
}
parameters {
  real mu;
  real sigma;
}
transformed parameters{
  real<lower=0> a;
  real<lower=0> b;
  a = square(mu/sigma);
  b = mu/square(sigma);
}
model {
  mu ~ gamma(10, .6);
  sigma ~ exponential(0.3);
  y ~ gamma(a, b);
  target += gamma_lcdf(11 | a, b)*Nmiss;
}

I generate some fake data like this:


gg <- rgamma(800, 15^2/2.5^2, 15/2.5^2)
gg_trunc <- gg[gg>11]

hist(gg_trunc, xlim = c(0, 100))

And then sample it like this:


censor_model_vec <- cmdstan_model(stan_file = "truncated/gamma_censor_vec.stan")
censor_model_vec_samp <- censor_model_vec$sample(data = list(
Nseen = length(gg_trunc),
  y = gg_trunc,
  Nmiss = (length(gg) - length(gg_trunc))),
  parallel_chains = 4)

censor_model_vec_samp <- censor_model_vec$sample(data = list(Nseen = length(gg_trunc),
                                            y = gg_trunc,
                                            Nmiss = (length(gg) - length(gg_trunc))),
                                     parallel_chains = 4)


The resulting samples have VERY bad diagnostics, in terms of Rhat and ESS

variable     mean   median   sd  mad       q5      q95 rhat ess_bulk ess_tail
    lp__  -1898.03 -1898.40 4.09 5.23 -1903.53 -1891.77 2.98        4       33
    mu       15.02    15.00 0.45 0.63    14.54    15.62 3.51        4       11
    sigma     3.44     3.43 0.07 0.08     3.36     3.53 3.58        4       11
    a        19.09    19.15 0.46 0.65    18.46    19.67 2.74        4       23
    b         1.27     1.27 0.01 0.01     1.25     1.29 2.73        4       11

Have I done something wrong? I predict I’ve missed some obvious point of how to correctly work with the censored data. Thanks for any advice!

Created on 2022-06-22 by the reprex package (v2.0.0)

I think you found a bug. Everything looks right. If I code up the gamma_lcdf (assuming I did that right) I get

  variable     mean   median     sd    mad       q5      q95  rhat ess_bulk ess_tail
  <chr>       <dbl>    <dbl>  <dbl>  <dbl>    <dbl>    <dbl> <dbl>    <dbl>    <dbl>
1 lp__     -1759.   -1759.   1.03   0.741  -1762.   -1758.    1.00    1855.    2254.
2 mu          15.2     15.2  0.0991 0.0967    15.0     15.3   1.00    3134.    2595.
3 sigma        2.68     2.68 0.0700 0.0693     2.57     2.80  1.00    3431.    2776.
4 a           32.1     32.0  1.62   1.64      29.5     34.7   1.00    3819.    2743.
5 b            2.11     2.11 0.108  0.107      1.94     2.29  1.00    3618.    2621.

Here’s the stan code

functions {
 real my_gamma_lcdf(real x, real a, real b) {
   return -lgamma(a) + log(gamma_p(a, b * x));
 }
}
data {
  int<lower=0> Nseen;
  array[Nseen] real<lower=11> y;
  int<lower=0> Nmiss;
}
parameters {
  real mu;
  real sigma;
}
transformed parameters{
  real<lower=0> a;
  real<lower=0> b;
  a = square(mu/sigma);
  b = mu/square(sigma);
}
model {
  mu ~ gamma(10, .6);
  sigma ~ exponential(0.3);
  y ~ gamma(a, b);
 // target += gamma_lcdf(11 | a, b) * Nmiss;
  target += my_gamma_lcdf(11 | a, b);
}

@andrjohns @rok_cesnovar I’ll post an issue in math

Stan’s LCDF appears to match R:

> rstan::stanFunction("gamma_lcdf",y=1,a=1,b=1)
[1] -0.4586751
> log(pgamma(1,1,1))
[1] -0.4586751

So I’m not sure that it is a bug

There’s definitely something. The original model samples really slowly. Using my UDF is like 100x faster. That is weird.

I did forget to add the Nmiss above. The results I’m getting are now

# A tibble: 5 × 10
  variable      mean    median     sd    mad        q5       q95  rhat ess_bulk ess_tail
  <chr>        <dbl>     <dbl>  <dbl>  <dbl>     <dbl>     <dbl> <dbl>    <dbl>    <dbl>
1 lp__     -2358.    -2358.    1.01   0.726  -2360.    -2357.     1.00    1905.    2092.
2 mu          14.9      14.9   0.227  0.229     14.6      15.3    1.00    1847.    2229.
3 sigma        6.34      6.33  0.157  0.162      6.09      6.60   1.00    1849.    2204.
4 a            5.56      5.56  0.221  0.220      5.20      5.92   1.00    3394.    2618.
5 b            0.372     0.373 0.0157 0.0160     0.347     0.398  1.00    2309.    2617.

which doesn’t recover the parameters.

1 Like

Ok, I don’t know what’s happening but fitting this in R using MLE does recover the parameters. First the full data and then the censored data.

library(cmdstanr)
library(fitdistrplus)
library(data.table)
gg <- rgamma(800, shape = 36, 2.5)
gg_trunc <- gg[gg > 11]

Nmiss <- (length(gg) - length(gg_trunc))
dat <- data.table(left = c(rep(NA, Nmiss), gg_trunc),
           right = c(rep(11, Nmiss), gg_trunc))

> fitdist(gg, "gamma")
Fitting of the distribution ' gamma ' by maximum likelihood 
Parameters:
       estimate Std. Error
shape 35.401801  1.7618030
rate   2.485492  0.1245713

> fitdistcens(dat, "gamma")
Fitting of the distribution ' gamma ' on censored data by maximum likelihood 
Parameters:
       estimate
shape 35.519115
rate   2.493473

Now the Stan model is giving

> censor_model_vec_samp$summary()
# A tibble: 5 × 10
  variable      mean    median     sd    mad        q5       q95  rhat ess_bulk ess_tail
  <chr>        <dbl>     <dbl>  <dbl>  <dbl>     <dbl>     <dbl> <dbl>    <dbl>    <dbl>
1 lp__     -2348.    -2348.    1.02   0.712  -2350.    -2347.     1.00    1338.    1664.
2 mu          14.0      14.0   0.246  0.242     13.6      14.4    1.00    1260.    1279.
3 sigma        6.87      6.87  0.177  0.177      6.58      7.16   1.00    1279.    1541.
4 a            4.17      4.17  0.158  0.158      3.92      4.44   1.00    3298.    2741.
5 b            0.298     0.297 0.0124 0.0123     0.278     0.318  1.00    1726.    2032.

I could be off-base here, but I think that where you have:

you instead need to retain the normalizing constant. Otherwise you’re arbitrarily re-weighting the observed versus censored components of the likelihood, right?

As I understand the manual, we’re treating the censored data like missing values and integrating over them. So for y\gt11 (in my example 11 is the censoring value) we \text{Pr}(y) = \text{gamma}(y | a, b) and for y\lt11 we use the probability of having missed that many values of y given a and b. That probability is \text{gamma_CDF}(11|a, b)^n, if you have n missing values.

Is that the same as what you are describing? I’m not sure what you mean by “retain” in this context

Unfortunately, changing to target += gamma_lpdf(y | a, b); does not solve the issue.

This line implies the data are truncated, not censored. You should use the truncation code: 4.2 Truncated data | Stan User’s Guide

For censoring, you should replace the values less than 11 with exactly 11.

The model is written with integrating out the censoring. Since it’s known how many points are censored and the point of censoring. This seems like a straight copy of the manual model replacing normal with gamma.

1 Like

@spinkney that was indeed the intention, just to take the manual’s censoring example and change the distribution. @hhau , you’re correct that censored data would have every y less that 11 measured as exactly 11. But then, if I understand correctly, we would count all the 11s and then apply this same model for Nseen values of y > 11 and Nmiss values of y \leq 11

Yea, it’s the integrating out using the gamma_lcdf function which is a problem. It appears that gamma is fine when you don’t integrate.

I simplified the example a bit to mess around but this is the model without integrating out the censored values.

data {
  int<lower=0> Nseen;
  array[Nseen] real y;
  int<lower=0> Nmiss;
}
parameters {
  real<lower=0> a;
  real<lower=0> b;
  array[Nmiss] real<lower=0, upper=11> y_cens;
}
model {
  a ~ normal(20, 10);
  b ~ std_normal();
  y ~ gamma(a, b);
  y_cens ~ gamma(a, b);
}

Is the current best assumption that the gamma_lcdf has a bug at some extreme values?

Yes, and I see that the the lcdf function calls gamma_p so I think it’s this function which is problematic, given that I can write the lcdf in a UDF using gamma_p explicitly and the problem persists.

1 Like

Yes, I now see that this is equivalent.

For completeness, the truncated version of this model is considerably slower than the censoring version:

data {
  int<lower=0> Nseen;
  array[Nseen] real y;
}
parameters {
  real<lower=0> a;
  real<lower=0> b;
}
model {
  a ~ normal(20, 10);
  b ~ std_normal();
  for (ii in 1 : Nseen) {
    y[ii] ~ gamma(a, b) T[11, ];
  }
}

presumably because the T[11, ] gets translated into the LCDF and is inside a loop.

It is also interesting that looking at the log_prob values directly:

param_grid <- expand.grid(
  mu = seq(from = 10, to = 60, length.out = 2e3),
  sigma = seq(from = 0.01, to = 25, length.out = 2e3)
)

# look at the log-posterior surface
log_probs <- sapply(1 : nrow(param_grid), function(ii) {
  rstan::log_prob(
    cens_res,
    upars = rstan::unconstrain_pars(
      cens_res,
      list(
        mu = param_grid$mu[ii],
        sigma = param_grid$sigma[ii]
      )
    )
  )
})

Yields:

Error in object@.MISC$stan_fit_instance$log_prob(upars, adjust_transform, : 
Exception: grad_reg_inc_gamma: k (internal counter) exceeded 100000 iterations, gamma function gradient did not converge. (in 'anon_model', line 21, column 2 to column 42)
2 Likes

Nice find! This function is used in the rev mode derivatives for gamma_p

Quick update all. The gamma_p function is regularized - which I missed originally. The issue is resolved if you use

 real my_gamma_lcdf(real x, real a, real b) {
   return log(gamma_p( a, b * x));
 }

For example,

functions {
 real my_gamma_lcdf(real x, real a, real b) {
   return log(gamma_p( a, b * x));
 }
 
}
data {
  int<lower=0> Nseen;
  array[Nseen] real y;
  int<lower=0> Nmiss;
  real<lower=0> L;
}
transformed data {
 // real b = 2.5;
}
parameters {
  real<lower=0> a;
  real<lower=0> b;
}
model {
 y ~ gamma(a, b);
 target += my_gamma_lcdf(L | a, b) * Nmiss;
}

In R

library(cmdstanr)
gg <- rgamma(800, shape = 32, 2.5)
L <- 11
gg_trunc <- gg[gg > L]

stan_data = list(
  Nseen = length(gg_trunc),
  y = gg_trunc,
  Nmiss = length(gg) - length(gg_trunc),
  L = L
)

mod <- cmdstan_model(stan_file = "gamma_cens.stan")
mod_out <- mod$sample(
  data = stan_data,,
  iter_warmup = 2000,
  iter_sampling = 400,
  parallel_chains = 4)
mod_out$summary()

> mod_out$summary()
# A tibble: 3 × 10
  variable     mean   median    sd   mad       q5     q95  rhat ess_bulk ess_tail
  <chr>       <dbl>    <dbl> <dbl> <dbl>    <dbl>   <dbl> <dbl>    <dbl>    <dbl>
1 lp__     -1593.   -1593.   1.14  0.786 -1596.   -1.59e3  1.01     286.     365.
2 a           31.6     31.6  2.09  2.10     28.2   3.51e1  1.01     168.     173.
3 b            2.47     2.47 0.162 0.162     2.20  2.73e0  1.01     169.     179.

@andrjohns @rok_cesnovar we should just use this for gamma_lcdf. I think it’s much more stable.

Math issue is at change `gamma_lcdf` to just use `gamma_p` and the rev-mode derivatives · Issue #2763 · stan-dev/math · GitHub

1 Like

I think that’s already the case:


    const T_partials_return Pn = gamma_p(alpha_dbl, beta_dbl * y_dbl);

    P += log(Pn);

Here: math/gamma_lcdf.hpp at 882cd229970f098ca078cb5b87873967dc94698f · stan-dev/math · GitHub