I have Weibull model for left-truncated and right-censored data with time-varying predictors. The code below runs about 2.5-3 hours on 15k records. I need to run this on about 300k records. Can anyone spot any obvious improvements that I can make to run this faster? I added some comments and simplified where I can, I hope it helps.
I’m also wondering if having three different parameter vectors (all parameters distributed normal(0,1)) are less efficient than having a big one, and just accessing the correct indices where needed? Or is there perhaps a way to avoid the for-loop where I use the index variables and vectorize appropriately? Not sure where I can win i.t.o. efficiency, any advice will be appreciated.
data {
int<lower=0> N; //Number of observations
int<lower=0> P; // Number of numeric predictors
int<lower=0> P_ind; // Number of factor variables (binary)
vector[N] d; // Survival event variable 1:Death, 0:Censored
vector<lower=0>[N] stop; // age at stop time
vector<lower=0>[N] start; // age at start time
int<lower = 1> X_ind[N, P_ind]; // Index matrix for categorical variables. Values are either 1 or 2
matrix[N, P] X; //Design matrix for numeric variables
}
parameters {
real<lower=0> alpha; //Shape parameter of Weibull
vector<lower=0>[P] beta_sigma; // Coefficient for numeric predictors
vector<lower=0>[P_ind] beta_ind_sigma1; // Stdev for index variable =1
vector<lower=0>[P_ind] beta_ind_sigma2; //Stdev for index variable = 2
vector[P] z_bs; // Normal(0, 1) value for non-centered parameterisation for numeric variables.
vector[P_ind] z_ibs1; // Normal(0, 1) value for non-centered parameterisation for index variables.
vector[P_ind] z_ibs2;
}
transformed parameters {
vector[P] beta;
matrix[P_ind, 2] beta_ind; //Two columns representing parameters for index variables
beta = z_bs .* beta_sigma;
beta_ind[,1] = z_ibs1 .* beta_ind_sigma1;
beta_ind[,2] = z_ibs2 .* beta_ind_sigma2;
}
model {
vector[N] sigma;
vector[N] sigma_ind_linear;
vector[P_ind] tempsum;
// Priors
beta_sigma ~ exponential(3);
beta_ind_sigma1 ~ exponential(3);
beta_ind_sigma2 ~ exponential(3);
z_bs ~ normal(0, 1);
z_ibs1 ~ normal(0, 1);
z_ibs2 ~ normal(0, 1);
alpha ~ gamma(2, 2);
// Model
// Adding up index variables contribution to target
for(i in 1:N){
for(j in 1:P_ind){
tempsum[j] = beta_ind[j, X_ind[i, j]];
}
sigma_ind_linear[i] = sum(tempsum);
}
// Linear model
sigma = exp((X*beta) + sigma_ind_linear)+0.001; // adding delta for numerical stability.
for(i in 1:N){
if(d[i]==0){
target += weibull_lccdf(stop[i]|alpha, sigma[i]) - weibull_lccdf(start[i]| alpha, sigma[i]);
}
else {
target += weibull_lpdf(stop[i]|alpha, sigma[i]) - weibull_lccdf(start[i]| alpha, sigma[i]);
}
}
}