LOO-CV for Survival model

Hi all,

I want to use the loo package to obtain LOO for model comparison.

I am running a survival model written in Stan as follows. I got the looic value=233359 and the PSIS diagnostic plot is attached (iter=10000, warmup=5000, thin=10, N=34744, n_grid=1001, num=12 ,province=31).


But for the other competing model, the value of the LOO was 25105 and its KHAT values were also bigger, and I really confused.

My question is whether I calculated the loo values correctly or not? Can anyone help me with this problem?

Thanks.

data {
  int <lower=0> province;
  int <lower=0> n_grid;
  int <lower=0> N;
  int <lower=0> num;
  int <lower=0> number;
  int <lower =0 , upper = province> county[N];
  vector <lower=0> [province] p;
  row_vector[n_grid] kernel[province];
  matrix [N,number] respon;
     }
parameters{
  vector [num] beta;
  vector <lower=0> [province] sigma1;
  real <lower=0> sigma;
  vector [n_grid] x;
  vector <lower=0> [province] r_county; 
     }
transformed parameters{
  vector [num] time_ratio;
  vector [N] lambdaa;
  real <lower=0> alpha = 1 / sigma;
  vector [n_grid] exp_x;
  vector[province] a; // `a` saved to output because it's a toplevel transformed parameter
  time_ratio  = exp(beta);
  exp_x = exp(x);
  lambdaa = exp (((-1 * respon[,7:18] * beta) / sigma)+log(r_county [county]));
  // }
     { // `z` not part of output
  vector[province] landa;
  for (k in 1 : province) {
        landa[k] = kernel[k] * exp_x * p[k];
     }
  a = landa / sum(landa); // assign `a`
     }
     }
model{
   vector [N] s_1;
   vector [N] s_2;
   vector [N] s_new1;
   vector [N] s_new2;
   target += normal_lpdf(x|0,5);
   target += normal_lpdf(beta | 0, 1000);
   target += gamma_lpdf(alpha | 0.001, 0.001);
   target += normal_lpdf(r_county| 0, sigma1);
   target += student_t_lpdf(sigma1| 3,0,1);
for (i in 1:N) {
   s_1[i]= 1- ((lambdaa [i]*(respon[i,2]+0.01)^alpha)/ (1+(lambdaa [i]*(respon[i,2]+0.01)^alpha)));
   s_2[i]= 1- ((lambdaa [i]*(respon[i,3]+0.01)^alpha)/ (1+(lambdaa [i]*(respon[i,3]+0.01)^alpha)));
   s_new1[i] = a[county[i]] * s_1[i];
   s_new2[i] = a[county[i]] * s_2[i];
   target += log(s_1[i] - s_2[i]) + log(a[county[i]]);
   }
   }
generated quantities{
   vector [N] log_lik;
   vector [N] s_1;
   vector [N] s_2;
   vector [N] s_new1;
   vector [N] s_new2;
for (i in 1:N){
   s_1[i]= 1- ((lambdaa [i]*(respon[i,2]+0.01)^alpha)/ (1+(lambdaa [i]*(respon[i,2]+0.01)^alpha)));
   s_2[i]= 1- ((lambdaa [i]*(respon[i,3]+0.01)^alpha)/ (1+(lambdaa [i]*(respon[i,3]+0.01)^alpha)));
   s_new1[i] = a[county[i]] * s_1[i];
   s_new2[i] = a[county[i]] * s_2[i];
   log_lik [i]= log(s_1[i] - s_2[i]) + log(a[county[i]]);
  }
  }
.```

Can you also provide the full output from the loo function for both models? Also I seethe code only for one model, so it seems you forgot to include he code for the other model.


Thank you so much for your quick reply. Sorry, I forgot the competing model codes. The following results are obtained from fitting two models under the same conditions (iter=4000, warmup=2000, thin=10, N=34744, n_grid=1001, num=12 ,province=31). It should be noted that with increasing iterations, there was no change in the results, including the value of LOOIC. My purpose is here the model assessment and choosing the best fitting model on my real data. I appreciated again for your reply.

  1. My survival model:
    Computed from 200 by 34744 log-likelihood matrix

      Estimate    SE
    

elpd_loo -116668.3 216.2
p_loo 63.9 0.8
looic 233358.6 432.4

Monte Carlo SE of elpd_loo is 0.6.

Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 34728 100.0% 157
(0.5, 0.7] (ok)

  1. Competing model: Spatial survival model with BYM (CAR prior + Random effect)
    Computed from 200 by 34744 log-likelihood matrix

      Estimate    SE
    

elpd_loo -12552.8 147.8
p_loo 38.8 0.7
looic 25105.5 295.6

Monte Carlo SE of elpd_loo is 0.4.

Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 34738 100.0% 180
(0.5, 0.7] (ok) 6 0.0% 195
(0.7, 1] (bad) 0 0.0%
(1, Inf) (very bad) 0 0.0%

All Pareto k estimates are ok (k < 0.7).
See help(‘pareto-k-diagnostic’) for details.

Thank you in advance!

My stan codes used to fit the spatial survival model withBYM are as follows:


data {
  int <lower=0> province;
  int <lower=0> N;
  int <lower=0> num;
  int <lower=0> number;
  int <lower =0 , upper = province> county[N];
  matrix [N,number] respon;
  matrix[province,province] H;  // area adjacency matrix row standardized (ea row sums to 1)
  matrix[province,province] Delta;  // diagonal matrix in BYM 1991, sfstd model B1
  matrix[province,province] Ones; // diag matrix of ones for identity
  real phi; // phi = 1/maxeigval(Delta*H*Delta)
     }
transformed data {
  vector[province] zeros;  // zeros for mean of MVN specification
for (i in 1:province)
  zeros[i] = 0.0;
     }
parameters{
  vector [num] beta;
  real <lower=0> sigma;
  vector[province] w1;  // random effect (CAR)
  real<lower=1e-5> prec_sre;  // precision of CAR
  vector[province] theta;       // heterogeneous effects
     }
transformed parameters{
  vector [num] time_ratio;
  vector [N] lambdaa;
  real <lower=0> alpha = 1 / sigma;
  real <lower=0> tau_sq;
  time_ratio  = exp(beta);
  tau_sq = 1.0 / (prec_sre * prec_sre);
  lambdaa = exp ((-respon[,7:18] * beta - w1[county] - theta[county]) / sigma);
     }
model{
   vector [N] s_1;
   vector [N] s_2;
   matrix[province,province] Sigma_sre;  // covariance matrix for CAR
   target += normal_lpdf(beta | 0, 1000);
   target +=gamma_lpdf(alpha |0.001,0.001);
   prec_sre ~ gamma(0.001,0.001);    // precision of spatial re
// spatial re
   Sigma_sre = (tau_sq * inverse(Ones - phi * H)) * Delta;
   w1 ~ multi_normal(zeros, Sigma_sre);
   target += normal_lpdf(theta| 0, 1);
for (i in 1:N) {
   s_1[i]= (1- ((lambdaa [i]*(respon[i,2]+0.01)^alpha)/ (1+(lambdaa [i]*(respon[i,2]+0.01)^alpha))));
   s_2[i]= (1- ((lambdaa [i]*(respon[i,3]+0.01)^alpha)/ (1+(lambdaa [i]*(respon[i,3]+0.01)^alpha))));
   target += log(s_1[i] - s_2[i]);
    }
    }
generated quantities{
   vector [N] log_lik;
   vector [N] s_1;
   vector [N] s_2;
  for (i in 1:N){
    s_1[i]= (1- ((lambdaa [i]*(respon[i,2]+0.01)^alpha)/ (1+(lambdaa [i]*(respon[i,2]+0.01)^alpha))));
    s_2[i]= (1- ((lambdaa [i]*(respon[i,3]+0.01)^alpha)/ (1+(lambdaa [i]*(respon[i,3]+0.01)^alpha))));
    log_lik [i]= log(s_1[i] - s_2[i]);
    }
    }
.```

thanks for the awesome information.