Gompertz distribution cdf under-/overflow

fredh · September 21, 2024, 2:36pm

Hi,

I am trying to write custom functions for the _cdf/_lcdf for the gompertz distribtion Wikipedia

I have been successful with other standard parametric survival functions which are either implemented already or by custom functions. The gompertz is however challenging for me and I am unable to avoid under-/overflow issues. I have tried algebraic manipulation and using log1m/log1p and logsumexp without any success. I have done this in JAGS previously with the zeroes trick and modeling survival and hazard separately. Translating the JAGS-code may be an option, however it would be neat if this has already been done or if there are obvious ways to solve this.

I have written the gompertz _lpdf and _lccdf which does not seem to under-/overflow. I assume it is due to the double exp in the cdf.

I have not been successful in translating the discussions on BRMS in this thread and have tried several alternative ways, including splitting up the equation into multiple steps. Copy-pasting the simplest version of my _cdf function below. Any help is highly appreciated!

functions {
  real gompertz_cdf (real y, real b, real eta) {   // Not implemented in STAN, custom cdf-function
        return 1 - exp(-eta*(exp(b*y) - 1)); 
        }
}

nhuurre · September 21, 2024, 7:46pm

There are more composed functions. I believe the best you can do is

real gompertz_lcdf(real y, real b, real eta) {
  return log1m_exp(-eta*expm1(b*y));
}

fredh · September 21, 2024, 9:22pm

Thanks for the proposed code and for confirming this is the best I can do. I am only able to avoid the under-/overflow with unreasonable boundaries for b as shown below. I’ll try to translate my JAGS code instead and see whether that’s successful. I really appreciate the help as I can then park this approach as a dead end!

parameters {
      real <lower=0, upper=0.01> b; 
}

spinkney · September 21, 2024, 9:59pm

Can you post a mwe with some simulated data?

fredh · September 25, 2024, 2:59pm

Below is a mwe with some suggested input-data. The under-/overflow isn’t an issue here as the cdf is not included. However it doesn’t work properly for the mwe either. Hopefully something is obvious to you. Thanks!

OS.event.times <- c(1,1,1,3,3,4,6,7,7,8) 
OS.event <- c(0,0,0,0,1,0,1,0,1,0) 
trial.data <- data.frame(OS.months = OS.event.times, OS.event = OS.event)
trial.data

Folder for STAN models

Stanfolder <- "./R/STAN/STAN_models/" # What folder to create for JAGS-model-file
if(!dir.exists(Stanfolder)) {dir.create(Stanfolder)} # checks if dir exists, creates it if not

Setting up values for STAN

trial.N <- sum(trial.data$OS.event) # Number of datapoints
trial.events.N <- sum(trial.data$OS.event==1) # Number of events
trial.censored.N <- sum(trial.data$OS.event==0) # Number of censored
Trial.censored.vect <- rep(0, trial.censored.N) # Vector of censored events 
trial.event.times <- trial.data$OS.months[trial.data$OS.event==1] # Trial event times
trial.censored.times <- trial.data$OS.months[trial.data$OS.event==0] # Trial censored times

# Stan code
cat("functions {

      // shape b, scale eta
      real gompertz_cdf (real y, real b, real eta) {
        return 1 - exp(-eta*(exp(b*y) - 1));
      }
      
      real gompertz_lpdf (real y, real b, real eta) { 
        return log(b*eta) + eta + b*y - eta*exp(b*y);  
      }
      
     real gompertz_lcdf (real y, real b, real eta) {
          return log1m_exp(-eta*expm1(b*y));
     }
      
      real gompertz_lccdf (real y, real b, real eta){
        return -eta*(exp(b*y) - 1);
      }
      
    }
    data { 
        int N; 
        int<lower=0> Nobs; 
        int<lower=0> Ncen; 
        vector[Nobs] yobs; 
        vector[Ncen] ycen; 
    } 
    parameters {
      real <lower=0, upper=0.1> b;     // param 1
      real <lower=0, upper=0.1> eta;   // param 2 
    }
    model {
    
    // priors
    b ~ gamma(1, 20);     
    eta ~ gamma(1, 20); 
    
    for (i in 1:Nobs) {
    target += gompertz_lpdf(yobs[i] | b, eta);
    }
    for (j in 1:Ncen) {
    target += gompertz_lccdf(ycen[j] | b, eta);
    }
    
    }

", file=paste0("./R/STAN/STAN_models/Stan_Model_gompertz.stan")) 
fit <- stan("./R/STAN/STAN_models/Stan_Model_gompertz.stan", data=list(N = trial.N, Nobs = trial.events.N, 
                                                                       Ncen = trial.censored.N, yobs = trial.event.times, ycen = trial.censored.times, 
                                                                       trial_cen_vector = Trial.censored.vect), iter=5000, chains=4)
fit

[edit: escaped code—it requires back-ticks, not apostrophes (```)]

Bob_Carpenter · October 3, 2024, 10:07pm

You may want to jointly constrain b * max(y) to be below whatever would overflow.

P.S. I’d strongly recommend putting your Stan code in standalone files. It then lets you use prints with quotes and transposes with apostrophes and gives you an easy way to find lines by line number in error messages. It also prints more neatly with syntax highlighting.

fredh · October 8, 2024, 7:27am

Thanks a lot for the help and for the tips on using standalone files. I’ll make sure to look into that, rather than creating the file from within R.

I rechecked my algebra, but even though I failed to find errors I was still unsuccessful with the equations I started with above. I ended up using another parametrization of the gompertz similar to the one used by Jackson in the flexsurv package github. I’ll paste the working functions for future readers if anyone is interested in my solution.

The cdf in particular over-/underflows without constraints. Using reasonable constraints on b, eta and potentially also b * max(y) to avoid this is necessary, your proposal above was simple but enlightening.

functions {

      real gompertz_lpdf (real y, real b, real eta) { 
        return log(eta) + (b * y) + (-eta/b * (expm1(b * y)));
      }
      
      real gompertz_cdf (real y, real b, real eta) {  
         return 1 - exp(-eta/b * (expm1(b * y))); 
      }
      
     real gompertz_lcdf (real y, real b, real eta) {  
          return log1m_exp(-eta/b * (expm1(b * y))); 
     }
      
      real gompertz_lccdf (real y, real b, real eta){ 
        return (-eta/b * (expm1(b * y)));
      }
      
    }

Bob_Carpenter · October 11, 2024, 10:12pm

Thanks for following up. I’ll mark your own response as the solution.

Topic		Replies	Views
Underflow? in custom distribution function Modeling	5	707	June 12, 2018
Survival model for readout/interval data Modeling	1	309	September 1, 2020
Fitting Survival Model with Random Effect Modeling fitting-issues , specification , performance	1	601	October 1, 2021
Normal_lcdf status General	8	1343	January 31, 2019
How stan deal with overflow or underflow issue? Modeling fitting-issues	3	934	December 5, 2018

Gompertz distribution cdf under-/overflow

Folder for STAN models

Setting up values for STAN

Related topics