Wiebull time to event model with right censoring and repeated events


#1

Good morning all,

I am trying to write a model on rstan for my time to event data.
I use a Weibull model for my time of occurrence and a right censoring time.
In R, the calculation of my log pdf is as follows:
If you denote by T the vector of times at which event occured, Tc the censoring time (T has thus its first value 0, a certain number of non null times and as last value Tc , lambda and beta my parameters (on which I put lognormal priors by the way) and , my logpdf is calculated as follows

N <- length(T)
init <- which(T==0)
cens <- which(T==Tc)
ind <- setdiff(1:N, append(init,cens))
hazard <- (beta/lambda)*(T/lambda)^(beta-1) #my hazard function
H <- (T/lambda)^beta #the primitve of the hazard function
logpdf <- rep(0,N)
logpdf[cens] <- -H[cens] + H[cens-1]
logpdf[ind] <- -H[ind] + H[ind-1] + log(hazard[ind])

This is an implementation of the theoretical pdf for such models defined as
p(y |\psi) = \exp\left\{-\int_{0}^{T_c}h(u, \lambda,\beta)\textrm{d}u \right\} \prod_{j=1}^{n-1} h(t_{j}, \lambda,\beta)

How would you proceed in writing such model?

Here is an initial version

model <- 'data {
          int<lower=0> N_obs;// Number of observations
          vector[N_obs] time; //obs
          
          real<lower=0> lambda_pop;
          real<lower=0> beta_pop;
          real<lower=0> omega_lambda;
          real<lower=0> omega_beta;
        }
        parameters {
          vector<lower=0>[2] param;
        }

        model {
          //Priors
          param[1] ~ lognormal( lambda_pop , omega_lambda);
          param[2] ~ lognormal( beta_pop , omega_beta);
          time ~ weibull(param[1], param[2]);
        }'


#2

Okay, I don’t work with recurrent events so I am going to just trust that your pdf p(y|\psi) is right. I am going to work in terms of the Weibull parameterization in the Stan manual with shape \alpha and scale \sigma, but you can directly substitute \alpha with your \beta and \sigma with your \lambda. I’m also assuming that you have different observation units (I’ll call them people, but they’re whatever things are experiencing these events). Because people can have different numbers of events , I think it’s helpful to think about the two types of likelihood contributions: (1) \exp\{ - \int_0^{t_j} h(u; \sigma, \alpha)du \} for any t_j = T_c and (2) h(t_j; \sigma, \alpha) for the observed event times t_j \ne T_c. So for every person or observation unit i, you have n_i events and one censoring time, yielding a total of \sum_{i=1}^N(n_i + 1) data points from N people. (The zero times in your T vector contribute nothing at all.) You can split those \sum_{i=1}^N(n_i + 1) times into the N_e = \sum_{i=1}^N n_i event times and N_c = N censoring times.

Likelihood contribution (1) above is exactly the survival function S(t)=1-F(t), evaluated at T_c. We can use the built-in weibull_lccdf to return \log(S(T_c)).

For likelihood contributions of type (2), you could hard-code the hazard function in terms of \alpha and \sigma, or you can use the fact that in survival analysis, h(t) = f(t)/S(t), and thus \log(h(t)) = \log(f(t)) - \log(S(t)). Stan is going to be doing everything in terms of the log-likelihood anyway, so this lets us write the hazard in terms of two built-in functions weibull_lpdf and weibull_lccdf, which return \log(f(t)) and \log(S(t)), respectively.

I’m not sure I 100% understand your existing T object in R, but I think you could run this in R:

event_times <- T[!(T %in% c(0, T_c))]
cens_times <- T[T == T_c]
N_e <- length(event_times)
N_c <- length(cens_times)

Then if you provide values for alpha_pop, omega_alpha, sigma_pop, and omega_sigma, you could use the following Stan code:

data {
  int<lower=1> N_e; // Number of total observed events
  int<lower=1> N_c; // Number of total censoring times ( = # observation units)
  vector<lower=0>[N_e] event_times; // Times of event occurrence
  vector<lower=0>[N_c] cens_times; // Censoring times (maybe just N copies of T_c?)
  real<lower=0> alpha_pop;
  real<lower=0> sigma_pop;
  real<lower=0> omega_alpha;
  real<lower=0> omega_sigma;
}
parameters {
  real<lower=0> alpha; // Weibull shape
  real<lower=0> sigma; // Weibull scale
}
model {
  // prior
  alpha ~ lognormal(alpha_pop, omega_alpha);
  sigma ~ lognormal(sigma_pop, omega_sigma);
  
  // likelihood
  for (n_e in 1:N_e) {
    target += weibull_lpdf(event_times[n_e] | alpha, sigma) - 
              weibull_lccdf(event_times[n_e] | alpha, sigma);
  } 
  for (n_c in 1:N_c) {
    target += weibull_lccdf(cens_times[n_c] | alpha, sigma);
  }
}

Of course, I can’t test this code without your data but that should hopefully be able to get you started. If I’ve understood your model right, I think you should even be able to vectorize by replacing the whole likelihood part with

  // likelihood
  target += weibull_lpdf(event_times | alpha, sigma) - 
            weibull_lccdf(event_times | alpha, sigma) +
            weibull_lccdf(cens_times | alpha, sigma);

Hope that helps.


#3

It helps. Thank you.

Indeed I have N individual but it does not matter here since I am trying to sample from an individual posterior distribution. (so I only deal with a vector of observations for a particular individual since they are independent in my problem).

Will try your code and let you know


#4

Hi there,

Juste as simple question:
In

alpha ~ lognormal(alpha_pop, omega_alpha);

omega_alpha is the variance or the standard deviation in rstan?

Thanks


#5

It’s neither. If you look at the pdf definition for lognormal in the Stan manual, you can see that if Y \sim \mathrm{lognormal}(\mu, \sigma), then \mu and \sigma are the standard deviation of X = \log(Y), which is normally distributed. So \sigma cannot be the same as either the standard deviation or the variance of Y itself. The variance of Y would be [\exp(\sigma ^{2})-1]\exp(2\mu +\sigma ^{2}), according to the all-knowing Wikipedia.


#6

Absolutely. So rry about that. My question is even simpler

alpha ~ normal(alpha_pop, omega_alpha);

Here omega_alpha is the variance or the std?

Thanks!


#7

If you look in the Stan manual for the normal distribution, you can see that the second argument (sigma, or \sigma) shows up squared in the denominator of the \exp(\cdot) part of the pdf. That means it must be the standard deviation, not the variance. It’s just like R in that respect.