Recurrent Events Survival model

Hello Everyone,

I am trying to estimate a recurring events survival model. The model code is as follows:

data {
  int<lower=0> M;  
  int<lower=0> N;
  int<lower=0> NT;
  int<lower=0> nid;  
  int<lower=0> indx[N];  
  int<lower=0> tstart[N]; 
  int<lower=0> tstop[N];  
  int<lower=0> event[N]; 
  int<lower=0> t[NT + 1]; 

  matrix[N, M] Z; 
  real eps;
}

transformed data {
  int Y[N, NT];
  int dN[N, NT]; 
  
  real c = 0.001;  
  real r = 0.1; 
  
  for(i in 1:N) {
    for(j in 1:NT) {
      Y[i, j] = int_step(tstop[i] - t[j] + eps)*int_step(t[j]-tstart[i] - eps);
      dN[i, j] = Y[i, j] * event[i] * int_step(t[j + 1] - tstop[i] - eps);
    }
  }
}

parameters {
  vector[M] beta;   
  real<lower=0> baseline[NT]; 
  vector[nid] w;   
} 

model {
  beta ~ normal(0, 1); 
  w ~ normal(0,1);  
  
  for(j in 1:NT) {
    baseline[j] ~ gamma(r * (t[j + 1] - t[j]) * c, c);
    for(i in 1:N) {
      if (Y[i, j] != 0) 
		increment_log_prob(poisson_log(dN[i, j], Y[i, j] * exp(Z[i,]*beta + w[indx[i]]) * baseline[j])); 		
      }     
  }
}

where,
M = the number of co-variates
N = total number of observations
NT = length of time id vector
nid = number of subjects
indx = vector of subject ids across N observations (repeated sometimes as the recurrent events happen multiple times for each individual)
tstart = starting time
tstop = stop time
event = indicator of a recurring event happening or not
t = vector of cut points or unique time ids (used to estimate the baseline hazard)
z = co-variates matrix (say x1 and x2)
eps = a fraction used inside int_step function

the input data is in long format . It is generated using two co-variate specific coefficients beta = [-1.5, 1.7] and a random effect term (frailty) . I want to recover these two coefficients. The model runs well. There is no diagnostic problem such as divergence or lack of mixing. However, the coefficients are always grossly underestimated. For example I got these values:

           mean     se_mean        sd        2.5%        25%         50%         75%       97.5%     n_eff        Rhat
beta[1]   -1.07      0.00          0.23     -1.52       -1.22       -1.06        -0.91     -0.62      3189           1
beta[2]    1.12      0.00          0.24      0.66        0.96        1.12         1.28      1.58      3265           1

The same code runs perfectly in case of single event (i.e. death) but fails when there are multiple events. What am I missing here? Any guidance would be very helpful

dat.csv (20.0 KB) datrun.r (1.1 KB)

Are you able to post the R code that simulates the data too?

And what is the error code you get when multiple events are run?

Two minor things unrelated to your question:

  1. A slight speed up is possible by using std_normal(); instead of normal(0, 1); for those priors.
  2. Did you mean to not put a prior on the baseline parameter?

Hi,

Thank you for your reply.

  1. Sure … I am enclosing the simulation code herewith. Its mostly based on coxed package - https://github.com/jkropko/coxed. Please have a look and let me know if I am making any mistake in the simulation part.
    Recurr.r (5.7 KB)

  2. I don’t get any error … it runs smoothly. However, the estimates are not close … I guess I am missing something in my likelihood.

  3. I shall use std_normal() as you suggested … any speed improvement is always welcome.

  4. I kept the baseline parameter as it is in case of single event model as it works well there. Please let me know if you feel any change is required.

Regards,

I have not had the time to look more into this but I think you should check out the survHE package (https://cran.r-project.org/web/packages/survHE/index.html) and see if you can fit the model using it. You can then check out the stan code generated.

Thank you so much !! Will have a look at SurvHE as you suggested …