 # Issues with defining the custom likelihoods of spatial survival model in Stan

Hello! I am writing the codes of my spatial survival model in Stan. The likelihood function of my model is attached (where Xj is the white noise and Xj ∼ N (0, sigma) ).
I defined the likelihood of the attached model for exponential distribution as follows in the Stan software, but
I have trouble coding and executing the model and I do not know exactly how to consider the censoring variable (in the codes, censor. status variable indicates right censorship) in the definition of likelihood and model. Can anyone help me?

``````functions {

real newweibul_log (vector time,vector censor.status,vector age, matrix kernel,vector x,vector p, int n_g,int province, real beta0,real beta, real sigma, vector lam) {
matrix [province,n_g] lambda;
vector [province] z;
vector [province] a;
matrix [province,num_elements(time)] L;
real L_i;

for (k in 1:province) {
for (j in 1:n_g) {
lambda [k,j] = kernel [k,j] * exp (x[j]);
}
z[k] = sum(lambda[k,]) * p[k];
a[k] = z[k]/sum(z);
}
for (i in 1:num_elements(time)){
log(lam[i]) = beta0 + beta1 * age[i];
}
for (k in 1:province) {
for (i in 1:num_elements(time)) {
L [k,i] = ((a[k]* lam [i] * exp( - lam [i] * time[i])) ^ censor.status[i]) * ((a[k]*exp(-lam [i]*time[i]))^ (1-censor.status[i]));
}
}
L_i = sum(log(L));
return L_i;
}
}
data {
int <lower=0> province;
int <lower=0> n_grid;
int <lower=0> N;
vector <lower=0> [province] p;
vector [N] time;
vector [N] age;
vector [n_grid] x;
matrix [province,n_grid] kernel;
//vector<lower=0, upper=1> [N] censor.status;
}
parameters{
real <lower=0> sigma;
real beta0;
real beta;
real lambdaa;
real a;
}
transformed parameters{
real time_ratio;
time_ratio  = exp(beta);
}
model{
for (j in 1:n_grid){
x ~ normal(0,5);
}
beta0 ~ normal( 0, 100);
beta ~ normal( 0, 100);
// real alpha;
//real bettta;
//alpha < - 1;
//bettta < - 1;
sigma ~ gamma(1, 1);
//mu ~ normal( 0, 100);
//real alpha0;
// real bettta0;
// alpha0 < - 1;
// bettta0 < - 1;
lambdaa ~ gamma(1, 1);
for (i in 1:N) {
time[i] ~  newweibul ( beta0, beta, lambdaa[i], a[k]) ;
}
}
``````

Regards,
Eisa

1 Like

Hi,
sorry for not getting to you earlier. This looks like a quite complex model. We generally recommend implementing a simpler model first and add additional components/features one at a time, only after validating the previous model is able to recover parameters from simulated data. Is the code you shared already tested to be working, just without the censoring?

I am also not sure what exactly you mean by “trouble coding and executing the model”. Could you be more specific to let us solve one problem at a time?

Did you read the user’s guide chapter on censored data? (4.3 Censored data | Stan User’s Guide ) If yes, could you be more specific on where you get stuck / which part of the instructions are you unable to follow or understand? (it is no easy stuff, so feel free to ask for clarifications).

Best of luck with your model!

Actually, I ran my hierarchical survival model using RSTAN. But, I have a large dataset (N=35000) and found sampling speed is very time-consuming in Stan (takes ~ 7 days even for 2000 iteration/500 warmup).

Could you help me to tackle this issue, please?
Can I divide the code into two parts? first, compute the values of a (probability values) separately and then using its value as data in estimating the model parameters.

My stan code is as below (N=35000, n_grid=350, province=31, number=17,num=12).
In my codes, x[j]'s (j=1,2, …, n_grid) are white noise and It is used only in the calculation of a and there is no need to estimate it.

``````fit < - stan(“final model.stan”,data = da,iter=iter, warmup=warmup, cores =4, thin = 1, init = “random”)
``````
``````### Stan codes
data {
int <lower=0> province;
int <lower=0> n_grid;
int <lower=0> N;
int <lower=0> num;
int <lower=0> number;
vector <lower=0> [province] p;
row_vector[n_grid] kernel[province];
matrix [N,number] respon;
}
parameters{
vector [num] beta;
real <lower=0> sigma;
vector [n_grid] x;
}
transformed parameters{
real alpha = 1 / sigma;
vector [num] time_ratio;
vector [N] lambdaa;
vector [n_grid] exp_x;
vector[province] a; // `a` saved to output
time_ratio  = exp (beta);
exp_x = exp(x);
lambdaa = exp ((-1 * respon[,6:17] * beta) / sigma);
{ // `z` is not part of output
vector[province] z;
vector[province] landa;
for (k in 1 : province) {
landa[k] = kernel[k] * exp_x * p[k];
}
a = landa / sum(landa); // assign `a`
}
}
model{
real log_alpha;
log_alpha = log (alpha);
target += normal_lpdf(x|0,2);
target += normal_lpdf(beta | 0, 1000);
target += normal_lpdf(log_alpha | 0, 5);
for (i in 1:N) {
vector [province] f_i;
vector [province] s_i;
for (k in 1:province){
f_i[k]= a[k]*((lambdaa [i]*alpha*respon[i,4]^(alpha-1))/((1+(lambdaa [i]*respon[i,4]^alpha))^2));
s_i[k]= a[k]*(1- ((lambdaa [i]*respon[i,4]^alpha)/ (1+(lambdaa [i]*respon[i,4]^alpha))));
}
target += respon[i,5] *  (log (f_i)-log(s_i)) + log(s_i)- log_alpha;

}
}
``````

In such cases, it is often best to start with a subset of the data to let you iterate quickly and only move to full dataset when you are confident your model works for the smaller data. Often one can simplify the model and reduce dataset at the same time. In your case, it could make sense to start with data from a single province, (potentially further subsampled to say a 100 observations) low dimensional kernel (low `n_grid`) and using no predictors (i.e. `num = 0`). Once this is working, one can move to moree complex stuff.

Also note that you can use triple backticks (```) to neatly format your Stan/R code. I edited your previous post for you.

Thank you so much for explaining in detail.