Improving computational and statistical efficiency of survival model


#1

Hi guys,

I came up with the following Stan model and I was wondering whether you have some advise on how to improve the performance (both computationally and statistically). Note that I will apply the model to cases where N_uncensored+N_censored will be around 500k, NC (number of covariates) around 20 and m at most 10. The matrices basis_evals_censored, basis_evals_uncensored and deriv_basis_evals_uncensored contain the evaluation (and derivative thereof) of the corresponding m i-spline basis function for each of the individuals, at the corresponding survival time of that individual.

Any optimisation for the matrix / vector multiplications, that could lead to performance gains?

Ultimately I would like to reduce runtime.

data {
    int<lower=0> N_uncensored;                                      // number of uncensored data points
    int<lower=0> N_censored;                                        // number of censored data points
    int<lower=1> m;                                                 // number of basis splines
    int<lower=1> NC;                                                // number of covariates
    matrix[N_censored,NC] Q_censored;                               // Q-transf. of design matrix (censored)
    matrix[N_uncensored,NC] Q_uncensored;                           // Q-transf. of design matrix (uncensored)
    matrix[NC, NC] R;
    vector[N_censored] log_times_censored;                          // x=log(t) in the paper (censored)
    vector[N_uncensored] log_times_uncensored;                      // x=log(t) in the paper (uncensored)
    matrix[m,N_censored] basis_evals_censored;                      // ispline basis matrix (censored)
    matrix[m,N_uncensored] basis_evals_uncensored;                  // ispline basis matrix (uncensored)
    matrix[m,N_uncensored] deriv_basis_evals_uncensored;            // derivatives of isplines matrix (uncensored)
}
transformed data {
    matrix[NC,NC] R_inv = inverse(R);
}
parameters {
    row_vector<lower=0>[m] gammas;                                  // regression coefficients for splines
    vector[NC] betas_tr;                                               // regression coefficients for covariates
    real gamma_intercept;                                           // \gamma_0 in the paper
}
transformed parameters {
    vector[NC] betas = R_inv * betas_tr;
}
model {
    vector[N_censored] etas_censored;
    vector[N_uncensored] etas_uncensored;
    gammas ~ normal(0, 1);
    betas ~ normal(0,1);
    gamma_intercept   ~ normal(0,1);
    
    etas_censored = Q_censored*betas_tr + (gammas*basis_evals_censored)' + gamma_intercept;
    etas_uncensored = Q_uncensored*betas_tr + (gammas*basis_evals_uncensored)' + gamma_intercept;
    target += -exp(etas_censored);
    target += etas_uncensored - exp(etas_uncensored) - log_times_uncensored + log(gammas*deriv_basis_evals_uncensored)';
}

QR decomposition with non-negativity constraint
#2

I removed a couple of transpose operations, which where dispensable, see https://github.com/ermeel86/paramaetricsurvivalmodelsinstan/commit/ce95e88f5aa5657d490f82311e46cd0762062090.

Any optimisation beyond that?


#3

Make sure the columns of X are centered before you do the QR decomposition.


#4

Thanks! What about scaling the columns of X? Not so important as centering?


#5

The scaling of the columns of X is not relevant because the columns of Q are scaled so that each has a sum of squares of 1. That is actually not ideal because betas_tr can get too far from zero. Better to multiply Q by some constant and divide R by that same constant, where the constant could be 1.0 / N or 1 / sqrt(N - 1.0).


#6

Currently I multiply Q by sqrt(N-1) and divide R by sqrt(N-1). I wonder how in this case beta_tr can become too large? Q*beta_tr should be on the scale of the dependent variable, right? Now by multiplying Q with sqrt(N-1), doesn’t this imply that beta_tr could actually become very close to zero rather than large? I would expect the risk of beta_tr to become too large rather be alllicaple in the scenario where you multiply Q by 1/sqrt(N-1) or alike…

Am I missing something?


#7

If you are already scaling Q and applying the reverse scaling to R, then you should be fine. Since originally, Q\prime Q = I, if you construct Q^\ast = Q\sqrt{N - 1}, then the covariance of Q^\ast is the identity matrix. So that is like putting each column of Q^\ast in the units of its standard deviation.


#8

Not an efficiency comment, but a stability one. Remove the inverse from transformed data and replace transformed parameters line with:

vector[NC] betas = R \ betas_tr;

Efficiency wise, it replaces a quadratic operation (matrix-vector multiply) with a cubic one (matrix solving with built-in multiply), but it should be more stable.
If you’re not having arithmtic issues then the way you’re doing it should be fine.