Wiebull time to event model with right censoring and repeated events

belhal · July 6, 2018, 4:54pm

Good morning all,

I am trying to write a model on rstan for my time to event data.
I use a Weibull model for my time of occurrence and a right censoring time.
In R, the calculation of my log pdf is as follows:
If you denote by T the vector of times at which event occured, Tc the censoring time (T has thus its first value 0, a certain number of non null times and as last value Tc , lambda and beta my parameters (on which I put lognormal priors by the way) and , my logpdf is calculated as follows

N <- length(T)
init <- which(T==0)
cens <- which(T==Tc)
ind <- setdiff(1:N, append(init,cens))
hazard <- (beta/lambda)*(T/lambda)^(beta-1) #my hazard function
H <- (T/lambda)^beta #the primitve of the hazard function
logpdf <- rep(0,N)
logpdf[cens] <- -H[cens] + H[cens-1]
logpdf[ind] <- -H[ind] + H[ind-1] + log(hazard[ind])

This is an implementation of the theoretical pdf for such models defined as
p(y |\psi) = \exp\left\{-\int_{0}^{T_c}h(u, \lambda,\beta)\textrm{d}u \right\} \prod_{j=1}^{n-1} h(t_{j}, \lambda,\beta)

How would you proceed in writing such model?

Here is an initial version

model <- 'data {
          int<lower=0> N_obs;// Number of observations
          vector[N_obs] time; //obs
          
          real<lower=0> lambda_pop;
          real<lower=0> beta_pop;
          real<lower=0> omega_lambda;
          real<lower=0> omega_beta;
        }
        parameters {
          vector<lower=0>[2] param;
        }

        model {
          //Priors
          param[1] ~ lognormal( lambda_pop , omega_lambda);
          param[2] ~ lognormal( beta_pop , omega_beta);
          time ~ weibull(param[1], param[2]);
        }'

lcomm · July 6, 2018, 7:53pm

Okay, I don’t work with recurrent events so I am going to just trust that your pdf p(y|\psi) is right. I am going to work in terms of the Weibull parameterization in the Stan manual with shape \alpha and scale \sigma, but you can directly substitute \alpha with your \beta and \sigma with your \lambda. I’m also assuming that you have different observation units (I’ll call them people, but they’re whatever things are experiencing these events). Because people can have different numbers of events , I think it’s helpful to think about the two types of likelihood contributions: (1) \exp\{ - \int_0^{t_j} h(u; \sigma, \alpha)du \} for any t_j = T_c and (2) h(t_j; \sigma, \alpha) for the observed event times t_j \ne T_c. So for every person or observation unit i, you have n_i events and one censoring time, yielding a total of \sum_{i=1}^N(n_i + 1) data points from N people. (The zero times in your T vector contribute nothing at all.) You can split those \sum_{i=1}^N(n_i + 1) times into the N_e = \sum_{i=1}^N n_i event times and N_c = N censoring times.

Likelihood contribution (1) above is exactly the survival function S(t)=1-F(t), evaluated at T_c. We can use the built-in weibull_lccdf to return \log(S(T_c)).

For likelihood contributions of type (2), you could hard-code the hazard function in terms of \alpha and \sigma, or you can use the fact that in survival analysis, h(t) = f(t)/S(t), and thus \log(h(t)) = \log(f(t)) - \log(S(t)). Stan is going to be doing everything in terms of the log-likelihood anyway, so this lets us write the hazard in terms of two built-in functions weibull_lpdf and weibull_lccdf, which return \log(f(t)) and \log(S(t)), respectively.

I’m not sure I 100% understand your existing T object in R, but I think you could run this in R:

event_times <- T[!(T %in% c(0, T_c))]
cens_times <- T[T == T_c]
N_e <- length(event_times)
N_c <- length(cens_times)

Then if you provide values for alpha_pop, omega_alpha, sigma_pop, and omega_sigma, you could use the following Stan code:

data {
  int<lower=1> N_e; // Number of total observed events
  int<lower=1> N_c; // Number of total censoring times ( = # observation units)
  vector<lower=0>[N_e] event_times; // Times of event occurrence
  vector<lower=0>[N_c] cens_times; // Censoring times (maybe just N copies of T_c?)
  real<lower=0> alpha_pop;
  real<lower=0> sigma_pop;
  real<lower=0> omega_alpha;
  real<lower=0> omega_sigma;
}
parameters {
  real<lower=0> alpha; // Weibull shape
  real<lower=0> sigma; // Weibull scale
}
model {
  // prior
  alpha ~ lognormal(alpha_pop, omega_alpha);
  sigma ~ lognormal(sigma_pop, omega_sigma);
  
  // likelihood
  for (n_e in 1:N_e) {
    target += weibull_lpdf(event_times[n_e] | alpha, sigma) - 
              weibull_lccdf(event_times[n_e] | alpha, sigma);
  } 
  for (n_c in 1:N_c) {
    target += weibull_lccdf(cens_times[n_c] | alpha, sigma);
  }
}

Of course, I can’t test this code without your data but that should hopefully be able to get you started. If I’ve understood your model right, I think you should even be able to vectorize by replacing the whole likelihood part with

  // likelihood
  target += weibull_lpdf(event_times | alpha, sigma) - 
            weibull_lccdf(event_times | alpha, sigma) +
            weibull_lccdf(cens_times | alpha, sigma);

Hope that helps.

belhal · July 6, 2018, 8:53pm

It helps. Thank you.

Indeed I have N individual but it does not matter here since I am trying to sample from an individual posterior distribution. (so I only deal with a vector of observations for a particular individual since they are independent in my problem).

Will try your code and let you know

belhal · July 9, 2018, 8:37am

Hi there,

Juste as simple question:
In

alpha ~ lognormal(alpha_pop, omega_alpha);

omega_alpha is the variance or the standard deviation in rstan?

Thanks

lcomm · July 9, 2018, 2:44pm

It’s neither. If you look at the pdf definition for lognormal in the Stan manual, you can see that if Y \sim \mathrm{lognormal}(\mu, \sigma), then \mu and \sigma are the standard deviation of X = \log(Y), which is normally distributed. So \sigma cannot be the same as either the standard deviation or the variance of Y itself. The variance of Y would be [\exp(\sigma ^{2})-1]\exp(2\mu +\sigma ^{2}), according to the all-knowing Wikipedia.

belhal · July 9, 2018, 3:56pm

Absolutely. So rry about that. My question is even simpler

alpha ~ normal(alpha_pop, omega_alpha);

Here omega_alpha is the variance or the std?

Thanks!

lcomm · July 9, 2018, 4:04pm

If you look in the Stan manual for the normal distribution, you can see that the second argument (sigma, or \sigma) shows up squared in the denominator of the \exp(\cdot) part of the pdf. That means it must be the standard deviation, not the variance. It’s just like R in that respect.

Topic		Replies	Views
Data Augmentation for Censored Survival Data Modeling rstan , techniques , fitting-issues	0	260	June 16, 2023
Time-to-event, with only fraction of events observed - All post-warmup steps divergent Modeling	3	365	March 18, 2021
Imputing parameter from survival analysis (gamma distribution) Modeling rstan , hierarchical-model , survival	0	296	January 20, 2023
How to validate a survival model with both right censoring and interval censoring? Modeling ecology	14	1866	February 25, 2022
Zero-length vector for ragged data structures Modeling	4	358	July 9, 2019

Wiebull time to event model with right censoring and repeated events

Related Topics