Minimal censored Gamma with VERY bad performance

aammd · June 22, 2022, 7:12pm

Hi everyone,

I’m just trying to adapt the example code in the Stan manual to fit a censored Gamma model

// simple censored gamma
data {
  int<lower=0> Nseen;
  array[Nseen] real<lower=11> y;
  int<lower=0> Nmiss;
}
parameters {
  real mu;
  real sigma;
}
transformed parameters{
  real<lower=0> a;
  real<lower=0> b;
  a = square(mu/sigma);
  b = mu/square(sigma);
}
model {
  mu ~ gamma(10, .6);
  sigma ~ exponential(0.3);
  y ~ gamma(a, b);
  target += gamma_lcdf(11 | a, b)*Nmiss;
}

I generate some fake data like this:


gg <- rgamma(800, 15^2/2.5^2, 15/2.5^2)
gg_trunc <- gg[gg>11]

hist(gg_trunc, xlim = c(0, 100))

And then sample it like this:


censor_model_vec <- cmdstan_model(stan_file = "truncated/gamma_censor_vec.stan")
censor_model_vec_samp <- censor_model_vec$sample(data = list(
Nseen = length(gg_trunc),
  y = gg_trunc,
  Nmiss = (length(gg) - length(gg_trunc))),
  parallel_chains = 4)

censor_model_vec_samp <- censor_model_vec$sample(data = list(Nseen = length(gg_trunc),
                                            y = gg_trunc,
                                            Nmiss = (length(gg) - length(gg_trunc))),
                                     parallel_chains = 4)

The resulting samples have VERY bad diagnostics, in terms of Rhat and ESS

variable     mean   median   sd  mad       q5      q95 rhat ess_bulk ess_tail
    lp__  -1898.03 -1898.40 4.09 5.23 -1903.53 -1891.77 2.98        4       33
    mu       15.02    15.00 0.45 0.63    14.54    15.62 3.51        4       11
    sigma     3.44     3.43 0.07 0.08     3.36     3.53 3.58        4       11
    a        19.09    19.15 0.46 0.65    18.46    19.67 2.74        4       23
    b         1.27     1.27 0.01 0.01     1.25     1.29 2.73        4       11

Have I done something wrong? I predict I’ve missed some obvious point of how to correctly work with the censored data. Thanks for any advice!

^{Created on 2022-06-22 by the reprex package (v2.0.0)}

spinkney · June 22, 2022, 8:49pm

I think you found a bug. Everything looks right. If I code up the gamma_lcdf (assuming I did that right) I get

  variable     mean   median     sd    mad       q5      q95  rhat ess_bulk ess_tail
  <chr>       <dbl>    <dbl>  <dbl>  <dbl>    <dbl>    <dbl> <dbl>    <dbl>    <dbl>
1 lp__     -1759.   -1759.   1.03   0.741  -1762.   -1758.    1.00    1855.    2254.
2 mu          15.2     15.2  0.0991 0.0967    15.0     15.3   1.00    3134.    2595.
3 sigma        2.68     2.68 0.0700 0.0693     2.57     2.80  1.00    3431.    2776.
4 a           32.1     32.0  1.62   1.64      29.5     34.7   1.00    3819.    2743.
5 b            2.11     2.11 0.108  0.107      1.94     2.29  1.00    3618.    2621.

Here’s the stan code

functions {
 real my_gamma_lcdf(real x, real a, real b) {
   return -lgamma(a) + log(gamma_p(a, b * x));
 }
}
data {
  int<lower=0> Nseen;
  array[Nseen] real<lower=11> y;
  int<lower=0> Nmiss;
}
parameters {
  real mu;
  real sigma;
}
transformed parameters{
  real<lower=0> a;
  real<lower=0> b;
  a = square(mu/sigma);
  b = mu/square(sigma);
}
model {
  mu ~ gamma(10, .6);
  sigma ~ exponential(0.3);
  y ~ gamma(a, b);
 // target += gamma_lcdf(11 | a, b) * Nmiss;
  target += my_gamma_lcdf(11 | a, b);
}

@andrjohns @rok_cesnovar I’ll post an issue in math

andrjohns · June 22, 2022, 9:12pm

Stan’s LCDF appears to match R:

> rstan::stanFunction("gamma_lcdf",y=1,a=1,b=1)
[1] -0.4586751
> log(pgamma(1,1,1))
[1] -0.4586751

So I’m not sure that it is a bug

spinkney · June 22, 2022, 9:34pm

There’s definitely something. The original model samples really slowly. Using my UDF is like 100x faster. That is weird.

I did forget to add the Nmiss above. The results I’m getting are now

# A tibble: 5 × 10
  variable      mean    median     sd    mad        q5       q95  rhat ess_bulk ess_tail
  <chr>        <dbl>     <dbl>  <dbl>  <dbl>     <dbl>     <dbl> <dbl>    <dbl>    <dbl>
1 lp__     -2358.    -2358.    1.01   0.726  -2360.    -2357.     1.00    1905.    2092.
2 mu          14.9      14.9   0.227  0.229     14.6      15.3    1.00    1847.    2229.
3 sigma        6.34      6.33  0.157  0.162      6.09      6.60   1.00    1849.    2204.
4 a            5.56      5.56  0.221  0.220      5.20      5.92   1.00    3394.    2618.
5 b            0.372     0.373 0.0157 0.0160     0.347     0.398  1.00    2309.    2617.

which doesn’t recover the parameters.

spinkney · June 23, 2022, 12:35am

Ok, I don’t know what’s happening but fitting this in R using MLE does recover the parameters. First the full data and then the censored data.

library(cmdstanr)
library(fitdistrplus)
library(data.table)
gg <- rgamma(800, shape = 36, 2.5)
gg_trunc <- gg[gg > 11]

Nmiss <- (length(gg) - length(gg_trunc))
dat <- data.table(left = c(rep(NA, Nmiss), gg_trunc),
           right = c(rep(11, Nmiss), gg_trunc))

> fitdist(gg, "gamma")
Fitting of the distribution ' gamma ' by maximum likelihood 
Parameters:
       estimate Std. Error
shape 35.401801  1.7618030
rate   2.485492  0.1245713

> fitdistcens(dat, "gamma")
Fitting of the distribution ' gamma ' on censored data by maximum likelihood 
Parameters:
       estimate
shape 35.519115
rate   2.493473

Now the Stan model is giving

> censor_model_vec_samp$summary()
# A tibble: 5 × 10
  variable      mean    median     sd    mad        q5       q95  rhat ess_bulk ess_tail
  <chr>        <dbl>     <dbl>  <dbl>  <dbl>     <dbl>     <dbl> <dbl>    <dbl>    <dbl>
1 lp__     -2348.    -2348.    1.02   0.712  -2350.    -2347.     1.00    1338.    1664.
2 mu          14.0      14.0   0.246  0.242     13.6      14.4    1.00    1260.    1279.
3 sigma        6.87      6.87  0.177  0.177      6.58      7.16   1.00    1279.    1541.
4 a            4.17      4.17  0.158  0.158      3.92      4.44   1.00    3298.    2741.
5 b            0.298     0.297 0.0124 0.0123     0.278     0.318  1.00    1726.    2032.

jsocolar · June 23, 2022, 2:28am

I could be off-base here, but I think that where you have:

you instead need to retain the normalizing constant. Otherwise you’re arbitrarily re-weighting the observed versus censored components of the likelihood, right?

aammd · June 23, 2022, 3:05am

As I understand the manual, we’re treating the censored data like missing values and integrating over them. So for y\gt11 (in my example 11 is the censoring value) we \text{Pr}(y) = \text{gamma}(y | a, b) and for y\lt11 we use the probability of having missed that many values of y given a and b. That probability is \text{gamma_CDF}(11|a, b)^n, if you have n missing values.

Is that the same as what you are describing? I’m not sure what you mean by “retain” in this context

spinkney · June 23, 2022, 8:47am

Unfortunately, changing to target += gamma_lpdf(y | a, b); does not solve the issue.

hhau · June 23, 2022, 9:32am

This line implies the data are truncated, not censored. You should use the truncation code: 4.2 Truncated data | Stan User’s Guide

For censoring, you should replace the values less than 11 with exactly 11.

spinkney · June 23, 2022, 10:05am

The model is written with integrating out the censoring. Since it’s known how many points are censored and the point of censoring. This seems like a straight copy of the manual model replacing normal with gamma.

aammd · June 23, 2022, 10:12am

@spinkney that was indeed the intention, just to take the manual’s censoring example and change the distribution. @hhau , you’re correct that censored data would have every y less that 11 measured as exactly 11. But then, if I understand correctly, we would count all the 11s and then apply this same model for Nseen values of y > 11 and Nmiss values of y \leq 11

spinkney · June 23, 2022, 10:17am

Yea, it’s the integrating out using the gamma_lcdf function which is a problem. It appears that gamma is fine when you don’t integrate.

I simplified the example a bit to mess around but this is the model without integrating out the censored values.

data {
  int<lower=0> Nseen;
  array[Nseen] real y;
  int<lower=0> Nmiss;
}
parameters {
  real<lower=0> a;
  real<lower=0> b;
  array[Nmiss] real<lower=0, upper=11> y_cens;
}
model {
  a ~ normal(20, 10);
  b ~ std_normal();
  y ~ gamma(a, b);
  y_cens ~ gamma(a, b);
}

rok_cesnovar · June 23, 2022, 10:19am

Is the current best assumption that the gamma_lcdf has a bug at some extreme values?

spinkney · June 23, 2022, 10:22am

Yes, and I see that the the lcdf function calls gamma_p so I think it’s this function which is problematic, given that I can write the lcdf in a UDF using gamma_p explicitly and the problem persists.

hhau · June 23, 2022, 10:37am

Yes, I now see that this is equivalent.

For completeness, the truncated version of this model is considerably slower than the censoring version:

data {
  int<lower=0> Nseen;
  array[Nseen] real y;
}
parameters {
  real<lower=0> a;
  real<lower=0> b;
}
model {
  a ~ normal(20, 10);
  b ~ std_normal();
  for (ii in 1 : Nseen) {
    y[ii] ~ gamma(a, b) T[11, ];
  }
}

presumably because the T[11, ] gets translated into the LCDF and is inside a loop.

hhau · June 23, 2022, 10:42am

It is also interesting that looking at the log_prob values directly:

param_grid <- expand.grid(
  mu = seq(from = 10, to = 60, length.out = 2e3),
  sigma = seq(from = 0.01, to = 25, length.out = 2e3)
)

# look at the log-posterior surface
log_probs <- sapply(1 : nrow(param_grid), function(ii) {
  rstan::log_prob(
    cens_res,
    upars = rstan::unconstrain_pars(
      cens_res,
      list(
        mu = param_grid$mu[ii],
        sigma = param_grid$sigma[ii]
      )
    )
  )
})

Yields:

Error in object@.MISC$stan_fit_instance$log_prob(upars, adjust_transform, : 
Exception: grad_reg_inc_gamma: k (internal counter) exceeded 100000 iterations, gamma function gradient did not converge. (in 'anon_model', line 21, column 2 to column 42)

spinkney · June 23, 2022, 10:48am

Nice find! This function is used in the rev mode derivatives for gamma_p

spinkney · June 25, 2022, 11:05am

Quick update all. The gamma_p function is regularized - which I missed originally. The issue is resolved if you use

 real my_gamma_lcdf(real x, real a, real b) {
   return log(gamma_p( a, b * x));
 }

For example,

functions {
 real my_gamma_lcdf(real x, real a, real b) {
   return log(gamma_p( a, b * x));
 }
 
}
data {
  int<lower=0> Nseen;
  array[Nseen] real y;
  int<lower=0> Nmiss;
  real<lower=0> L;
}
transformed data {
 // real b = 2.5;
}
parameters {
  real<lower=0> a;
  real<lower=0> b;
}
model {
 y ~ gamma(a, b);
 target += my_gamma_lcdf(L | a, b) * Nmiss;
}

In R

library(cmdstanr)
gg <- rgamma(800, shape = 32, 2.5)
L <- 11
gg_trunc <- gg[gg > L]

stan_data = list(
  Nseen = length(gg_trunc),
  y = gg_trunc,
  Nmiss = length(gg) - length(gg_trunc),
  L = L
)

mod <- cmdstan_model(stan_file = "gamma_cens.stan")
mod_out <- mod$sample(
  data = stan_data,,
  iter_warmup = 2000,
  iter_sampling = 400,
  parallel_chains = 4)
mod_out$summary()

> mod_out$summary()
# A tibble: 3 × 10
  variable     mean   median    sd   mad       q5     q95  rhat ess_bulk ess_tail
  <chr>       <dbl>    <dbl> <dbl> <dbl>    <dbl>   <dbl> <dbl>    <dbl>    <dbl>
1 lp__     -1593.   -1593.   1.14  0.786 -1596.   -1.59e3  1.01     286.     365.
2 a           31.6     31.6  2.09  2.10     28.2   3.51e1  1.01     168.     173.
3 b            2.47     2.47 0.162 0.162     2.20  2.73e0  1.01     169.     179.

@andrjohns @rok_cesnovar we should just use this for gamma_lcdf. I think it’s much more stable.

spinkney · June 25, 2022, 11:13am

Math issue is at change `gamma_lcdf` to just use `gamma_p` and the rev-mode derivatives · Issue #2763 · stan-dev/math · GitHub

andrjohns · June 25, 2022, 11:13am

I think that’s already the case:


    const T_partials_return Pn = gamma_p(alpha_dbl, beta_dbl * y_dbl);

    P += log(Pn);

Here: math/gamma_lcdf.hpp at 882cd229970f098ca078cb5b87873967dc94698f · stan-dev/math · GitHub

Topic		Replies	Views
Divergent transition fitting truncated gamma Modeling	10	1546	January 28, 2020
Censored regression Modeling	2	464	August 8, 2022
Gamma distribution - parameterization -- gamma distribution General	1	2740	December 21, 2017
Stan Users Guide - 4.3 Censored Data Modeling fitting-issues	0	299	August 23, 2023
Gamma regression in Stan vs Frequentist approach Modeling specification	11	3208	March 26, 2024

Minimal censored Gamma with VERY bad performance

Related topics