I’ve been working on a project and have written some stan codes. It’s working fine, but I’ve been wondering what I could do to make it more efficient. I’ve read through the documentation and tried to make changes to my codes. I just want to post it out here, together with my considerations, to see whether there’s anything else I could do to optimize it.
functions {
// competing risks cox proportional hazards model, log likelihood
real crcox_lpdf(vector y, matrix Xbeta, matrix delta, matrix lambda_cal, matrix H) {
return sum(delta .* (log(lambda_cal) + Xbeta) - exp(Xbeta) .* H);
}
}
data {
int<lower=1> n; // number of data points
int<lower=1> p; // number of covariates
int<lower=1> k; // number of bins for piecewise linear baseline hazard rates
int<lower=1> m; // number of risk types
vector[n] y; // time to event data
vector[k+1] s; // knots for the piecewise linear baseline hazard rates
matrix[n,m] delta; // indicator of risk type
matrix[n,p] X; // design matrix
real<lower=0> kappa0; // for multiplicative gamma process
real<lower=0> kappa1; // for multiplicative gamma process
}
transformed data {
array[n] int<lower=1,upper=k> ki; // bin for each event
matrix[k,m] s_diff; // bin widths, matrix form for calculation
matrix[n,m] T; // time since last knot, matrix form for calculation
for (i in 1:n) {
for (l in 1:k){
if (y[i]>s[l] && y[i] <= s[l+1]){
ki[i] = l;
T[i] = rep_row_vector(y[i] - s[l],m);
}
}
}
for (l in 1:k){
s_diff[l] = rep_row_vector(s[l+1]-s[l],m);
}
}
parameters {
matrix[p,m] beta;
matrix<lower=0>[k,m] psi;
vector[p] mu;
real<lower=0> s2;
}
transformed parameters {
matrix[k,m] lambda; // baseline hazard rates
lambda[1] = psi[1];
for (l in 2:k){
lambda[l] = psi[l] .* lambda[l-1];
}
}
model {
matrix[n,m] H; // baseline cumulative hazard rates for each ti
matrix[n,m] lambda_cal; // lambda for each ti
matrix[k,m] cum_lambda1;
matrix[k,m] cum_lambda2;
matrix[p,m] mu_rep;
matrix[n,m] Xbeta;
vector[(k-1)*m] psi_1d;
vector[p*m] mu_1d;
vector[p*m] beta_1d;
// get lambda_cal
lambda_cal = lambda[ki];
// get H
cum_lambda1 = lambda .* s_diff;
cum_lambda2[1] = zeros_row_vector(m);
for (l in 2:k){
cum_lambda2[l] = cum_lambda2[l-1] + cum_lambda1[l-1];
}
H = cum_lambda2[ki];
H = H + lambda_cal .* T;
mu_rep = rep_matrix(mu,m);
mu_1d = to_vector(mu_rep);
beta_1d = to_vector(beta);
psi_1d = to_vector(psi[2:]);
Xbeta = X * beta;
target += crcox_lpdf(y | Xbeta, delta, lambda_cal, H);
target += cauchy_lpdf(s2 |0,1);
target += normal_lpdf(beta_1d | mu_1d, sqrt(s2));
target += normal_lpdf(mu | 0, 10);
target += gamma_lpdf(psi[1] | kappa0, kappa0);
target += gamma_lpdf(psi_1d | kappa1, kappa1);
}
Apologies for posting such long codes and asking people to read … Here is the main question I have in mind: I’m mostly using matrices and vectors in my codes, although for several variables (e.g., lambda, and psi), I’m mainly looping them by row. I understand from the documentation that in such cases, it’d be more efficient to define them as arrays of row vectors. I tried to use arrays of row vectors, but decided to change back to matrix/ vector setup because:
- ultimately, my loglikelihood calculation (see the customized function) requires element-wise multiplication. And I believe having everything in matrix would make this more efficient;
- I wantd to make use of the vectorized form of normal_lpdf and gamma_lpdf functions. So I wanted to be able to convert everything into a 1-dim array or vector. There is a built-in function to convert matrix into vector, but there isn’t a straightforward way to put array of vectors into 1-dim array.
Some other questions I had:
- Does it make a difference whether I subset a row from a matrix using syntax
A[j,]
orA[j]
? - Any suggestions on how to benchmark the time for different versions of the code? I’m looking for similar functionality like the benchmark package in R where I can time two code chunks to tell which one is more efficient. For now I’ve been repeatedly running different versions of my codes to compare their time, but it’s hard to tell for subtle changes.
If you spot anything else that’s inefficient, I’d also be super interested to know!!!