Survival models in rstanarm


#21

Sorry, details like: I/we don’t know integrals of most exponentials of cubic polynomials… so you have to approximate or pick some other positive functions for bases that we do know. For example currently in Stan it’s easy to pick normal pdf for the basis function and use the CDF in the integral (where the CDF is an approximation).


#22

I think frailtypack, for example, uses M-splines directly on the hazard scale and I-splines directly on the cumulative hazard scale. But they aren’t in the MCMC setting.

In stan_jm things are modelled on the log hazard scale using B-splines, but then quadrature is used at every iteration of the MCMC to numerically obtain the cumulative hazard for use in the log-likelihood. The quadrature is computation you want to avoid. It is unavoidable in stan_jm because the nature of the model means there is no analytical solution to the cumulative hazard, but that needn’t be the case in most settings. So if possible you want an analytical form available for the (log) cumulative hazard when assuming proportional hazards and when introducing non-proportional hazards.

I think an additional benefit of Royston and Parmar modelling things on the log cumulative hazard scale is that that the introduction of time-dependent effects is easy (modelled using natural cubic splines, through an interaction between the hazard ratio and the spline basis, on the log cum haz scale) because they have analytical forms for everything – so even with time-dependent effects the models are super fast to fit (see stpm2 in Stata, or the rstpm2 package in R). I don’t think this would be the case if working on the log hazard scale as the introduction of time-dependent effects (i.e. non-proportional hazards) would mean you have to start using quadrature within each MCMC iteration. But I might be wrong. A downside of introducing time-dependent effects on the log cum haz scale is that interpretation is more difficult, but that isn’t the end of the world, since you can just plot things (e.g. the time-dependent haz ratio).


#23

Yes this is a good point. M-splines are easy to integrate, because they’re just piecewise polynomials, so we can integrate them in closed form. In fact, in my code I did symbolic integration using a package called Ryacas. It would be straight-forward to have your hazard function be defined by M-splines (which are positive), then the cumulative hazard function which is the integral would be an I-spline.

Of course you’d have to let the user define the domain of time over which they want to model and how many splines they want in that domain, but that should be straight-forward. Most people have a good idea whether survival for their specific problem is on the scale of days or months or whatever.

Anyway, it sounds like an interesting project, and I’ve always been interested in survival models. Let me know if you all need help with implementation or anything. I’d be to happy to chip in.


#24

Thanks for your feedback and offer to chip in. I will come back to it, actually I am doing so right now ;-) :

Out of curiosity, why didn’t you use the form presented in Eq (5) in the Ramsay paper, instead of integrating symbolically? Maybe I am overseeing something.

Not sure if I understand the version in Eq (5) correctly, but it seems that for any x one wants to evaluate (that lies within [L,U], in Ramsay’s notation) one has to determine the j s.t. t_j \leq x < t_{j+1}. I see how this works for any x\in [L,U) but what about x=U, there is no j that fulfils the above…

As a side remark, here comes a first draft of a Stan based implementation of I-splines that uses the recursion in Eq. (2) in Ramsay’s paper (any feedback is highly welcome):

functions {
    // n is the number of internal knots + order
    // ext_knots is of length n+order
    // ind is an integer between 1 and n
    // order has to be >= 1 (M-spline with order k has degree k-1)
    vector build_m_spline(real[] t, real[] ext_knots, int ind, int order);
    vector build_m_spline(real[] t, real[] ext_knots, int ind, int order) { 
        // INPUTS:
        // t: the points at which the m-spline is calculated 
        // ext_knots: the set of extended knots 
        // ind: the index of the m-spline 
        // order: the order of the m-spline
        vector[size(t)] m_spline = rep_vector(0,size(t));
        vector[size(t)] w1 = rep_vector(0, size(t)); 
        vector[size(t)] w2 = rep_vector(0, size(t));
        if(order==1)
            for (i in 1:size(t)) { 
                if(ext_knots[ind+1] !=  ext_knots[ind])
                    m_spline[i] = ((ext_knots[ind] <= t[i]) && ( t[i] < ext_knots[ind+1]))/
                    (ext_knots[ind+1] - ext_knots[ind]);
            }

        else {
            if( ext_knots[ind+order] != ext_knots[ind]) {
            //if (ext_knots[ind] != ext_knots[ind+order-1])
            
                w1 = to_vector(t) - rep_vector(ext_knots[ind], size(t));
            
            //if (ext_knots[ind+1] != ext_knots[ind+order]) 
                w2 = rep_vector(ext_knots[ind+order], size(t)) - to_vector(t);
                
                w1 /= ext_knots[ind+order] - ext_knots[ind];
                w2 /= ext_knots[ind+order] - ext_knots[ind];
                
            }
            m_spline = w1 .* build_m_spline(t, ext_knots, ind, order-1) + 
            w2 .* build_m_spline(t, ext_knots, ind+1, order-1);
            m_spline *= order/(order-1.);
        }
        return m_spline;
    }
}

It is based on the B-splines implementation here.

Once clear, I would go on and define the corresponding build_i_spline function, which we then can combine with the Royston & Parmar construction.


#25

Good question. I must’ve not noticed that. It seems like that equation is saying that one can write the I-Splines in terms of the M-splines, which seems pretty convenient. I’d have to verify that because it doesn’t seem immediately obvious. Regardless the integration is not hard to do.

Wouldn’t it just be 1 in that case?


#26

It is not clear to me, yet. I have to think, too.

Yet another (convenient) alternative might be to do the whole I-spline computation outside of Stan in R using splines2, see in particular this vignette. However I haven’t used it and don’t know how mature it is…

Not sure which design decision in rstanarm is typically preferred? a) do such computations in the transformed data block in Stan or b) do this “externally“ in R, using potentially third party CRAN packages.


#27

You can do either but we do more of b).


#28

I’ve been thinking about this and yes I agree too.

I think in the long term it will be in everyone’s best interest (developers and users and ultimately innocent bystanders affected by decisions made using Stan models in industry and research) if a precompiled package for survival modeling in Stan were part of the core set of packages supported by the Stan dev team. This doesn’t mean that development would be restricted to the Stan dev team (realistically we don’t have time now to do it all); it could include the people currently involved in this discussion and anyone else with something useful to contribute (it’s looking like this could be a great collaborative effort!).

From the Stan team side I think we could do the following:

  • host the package in the stan-dev org on GitHub, give it a website on mc-stan.org (e.g. http://mc-stan.org/bayesplot/)
  • not do the majority of the coding but be involved enough to have input into the Stan implementations as well as the interface (to make sure it integrates nicely with the existing ecosystem).
  • ensure the package’s sustained existence on repositories like CRAN (particularly important in the case that core contributors to the package end their involvement, which does happen).

As to whether this is a new package (maybe rstansurv?) or an addition to rstanarm (since rstanarm is already exactly what I described above except we wrote most of it), I’m not sure.

I would suggest starting to put together a preliminary Stan program (maybe just start with one type of model and then add more later) as well as some R code for preparing data to pass to Stan, and then once there’s something to look at / try out I think it will be easier for us to see whether it makes sense to go with rstanarm or in a new package on stan-dev.


#29

That was precisely my idea, and I’m working on a first model that is essentially an i-spline-based Royston & Parmar. I would also try to reproduce at least one example they mention in the paper.

We can then look where we go from there, if that’s ok for others. I will announce any update here.

However, now I have to wait a couple of days before I can continue to code.


#30

This all sounds smart to me. One general comment: it would be nice if Stan/rstan supported general-purpose routines that could easily be front-ended in both rstanarm and brms.

Frank


#31

I’m happy to see so many people interested in this,
How about making a wiki page in https://github.com/stan-dev/stan/wiki to make a list of features people would like to have and then edit it be a road map. Would be easier to show for new people than this thread.


#32

@harrelfe I’d be interesting in talking more about this. I’m not sure if what I have in mind is what you do. For example, do you mean on the R side or the Stan side of things? I do think the more that higher level interfaces to rstan can draw from the same pool of internal functions the better. We’ve started on this path with the rstantools package but it’s still early. I think at some point Paul and I will add a lot more to rstantools that can be shared across packages interfacing with rstan.

Would you mind clarifying a bit? And feel free to start a new topic for this if you want to get more eyes on it. Things can get buried in these long threads.


#33

A bit earlier than expected, I am happy to share a first version. It’s an i-spline based Royston&Parmar parametric proportional hazard model, so the splines are used to model the log baseline cumulative hazard function. I test some aspects of it based on two datasets I found in the literature. Moreover I decided to do the spline calculations in R.

Any feedback is highly welcome. Does it make sense to you guys? I also added a short list of next todos.

/************************************************************************************************************************/
data {
    int<lower=0> N_uncensored;                                      // number of uncensored data points
    int<lower=0> N_censored;                                        // number of censored data points
    int<lower=1> m;                                                 // number of basis splines
    int<lower=1> NC;                                                // number of covariates
    matrix[N_censored,NC] X_censored;                               // design matrix (censored)
    matrix[N_uncensored,NC] X_uncensored;                           // design matrix (uncensored)
    vector[N_censored] log_times_censored;                          // x=log(t) in the paper (censored)
    vector[N_uncensored] log_times_uncensored;                      // x=log(t) in the paper (uncensored)
    matrix[m,N_censored] basis_evals_censored;                      // ispline basis matrix (censored)
    matrix[m,N_uncensored] basis_evals_uncensored;                  // ispline basis matrix (uncensored)
    matrix[m,N_uncensored] deriv_basis_evals_uncensored;            // derivatives of isplines matrix (uncensored)
}
/************************************************************************************************************************/
parameters {
    row_vector<lower=0>[m] gammas;                                  // regression coefficients for splines
    vector[NC] betas;                                               // regression coefficients for covariates
    real gamma_intercept;                                           // \gamma_0 in the paper
}
/************************************************************************************************************************/
model {
    vector[N_censored] etas_censored;
    vector[N_uncensored] etas_uncensored;
    gammas ~ normal(0, 2);
    betas ~ normal(0,1);
    gamma_intercept   ~ normal(0,1);
    
    etas_censored = X_censored*betas + (gammas*basis_evals_censored)' + gamma_intercept;
    etas_uncensored = X_uncensored*betas + (gammas*basis_evals_uncensored)' + gamma_intercept;
    target += -exp(etas_censored);
    target += etas_uncensored - exp(etas_uncensored) - log_times_uncensored + log(gammas*deriv_basis_evals_uncensored)';
}
/************************************************************************************************************************/

#34

I made a wiki page for making a plan for survival analysis https://github.com/stan-dev/stan/wiki/Survival-analysis

Anyone can edit that page, so just go there and write what you would like to have there, references and how you could contribute. This is now brainstorming phase, and thus don’t be afraid to add things, as we can later clean up up and streamline. Wiki will anyway be easier check all the ideas, instead of going through this thread.


#35

I should have clarified that. I think I was really meaning to say on the R side.


#36

Glad to see so much progress so soon. I’m a big R user and not yet facile in using a solely Stan script, otherwise I’d be running some tests right away.


#37

@arya, @harrelfe or anybody else: I was wondering whether you have thought about the I-Spline behaviour outside the boundary knots? Natural cubic splines, as used by Royston & Parmar in their paper, are constructed so that they are guaranteed to be linear (and hence monotone) beyond the boundary splines. The I-Splines are guaranteed to be monotone in the interior but what outside the knots?

This would become relevant if we intend to evaluate the hazard rate or survival function (for a new instance) at times that are smaller / larger than any time survival time we have in the data.

Two possibilities come to my mind to circumvent this problem at all:

  • One way around would be to argue that the relevant regions of times shouldbe covered by the already observed instances (times), which were used to determine the knots. I find this to be a rather weak argument.
  • Alternatively, one could intentionally place the knots so that they include practically relevant values of t before inference is done. That would alter the standard “recipe” I’ve seen where people pick a number of (quantile) based interior knots and “surround” them with two boundary knots corresponding to the minimal and maximum value of survival times.

#38

I don’t favor modifying know location specification to take into account future extrapolation problems. But the solution is specified to the domain. For covariate extrapolation I like the linear tail restriction and do linear extrapolation. For the hazard function I’m tempted to do a flat extrapolation on the right.


#39

Thanks.

Which would correspond to linear extrapolation of the cumulative hazard function. Royston & Parmar extrapolate the log cumulative hazard function linearly to the right, which is different.

My current understanding of the i-splines is that they saturate, that is become flat/constant outside.

In general I’d rather favor an (monotonic) i-splines based modelling of the log cumulative hazard function together with non-negative priors on the weights of the different basis functions rather than using the argument that the monotonic behaviour would anyways be imposed by the data, and thus not making this property part of the likelihood and prior, as done by R&P.


#40

This is actually not correct in general, see the figure below taken from the paper by Ramsay. However in this case I_6 is also not linear beyond the right-most knot but cubic (since the order is 3) and M_6 is quadratic. To it increases much stronger, asymptotically, than linear.