LOO-CV for Survival model

Isa_Nazar · September 22, 2021, 1:51pm

Hi all,

I want to use the loo package to obtain LOO for model comparison.

I am running a survival model written in Stan as follows. I got the looic value=233359 and the PSIS diagnostic plot is attached (iter=10000, warmup=5000, thin=10, N=34744, n_grid=1001, num=12 ,province=31).

But for the other competing model, the value of the LOO was 25105 and its KHAT values were also bigger, and I really confused.

My question is whether I calculated the loo values correctly or not? Can anyone help me with this problem?

Thanks.

data {
  int <lower=0> province;
  int <lower=0> n_grid;
  int <lower=0> N;
  int <lower=0> num;
  int <lower=0> number;
  int <lower =0 , upper = province> county[N];
  vector <lower=0> [province] p;
  row_vector[n_grid] kernel[province];
  matrix [N,number] respon;
     }
parameters{
  vector [num] beta;
  vector <lower=0> [province] sigma1;
  real <lower=0> sigma;
  vector [n_grid] x;
  vector <lower=0> [province] r_county; 
     }
transformed parameters{
  vector [num] time_ratio;
  vector [N] lambdaa;
  real <lower=0> alpha = 1 / sigma;
  vector [n_grid] exp_x;
  vector[province] a; // `a` saved to output because it's a toplevel transformed parameter
  time_ratio  = exp(beta);
  exp_x = exp(x);
  lambdaa = exp (((-1 * respon[,7:18] * beta) / sigma)+log(r_county [county]));
  // }
     { // `z` not part of output
  vector[province] landa;
  for (k in 1 : province) {
        landa[k] = kernel[k] * exp_x * p[k];
     }
  a = landa / sum(landa); // assign `a`
     }
     }
model{
   vector [N] s_1;
   vector [N] s_2;
   vector [N] s_new1;
   vector [N] s_new2;
   target += normal_lpdf(x|0,5);
   target += normal_lpdf(beta | 0, 1000);
   target += gamma_lpdf(alpha | 0.001, 0.001);
   target += normal_lpdf(r_county| 0, sigma1);
   target += student_t_lpdf(sigma1| 3,0,1);
for (i in 1:N) {
   s_1[i]= 1- ((lambdaa [i]*(respon[i,2]+0.01)^alpha)/ (1+(lambdaa [i]*(respon[i,2]+0.01)^alpha)));
   s_2[i]= 1- ((lambdaa [i]*(respon[i,3]+0.01)^alpha)/ (1+(lambdaa [i]*(respon[i,3]+0.01)^alpha)));
   s_new1[i] = a[county[i]] * s_1[i];
   s_new2[i] = a[county[i]] * s_2[i];
   target += log(s_1[i] - s_2[i]) + log(a[county[i]]);
   }
   }
generated quantities{
   vector [N] log_lik;
   vector [N] s_1;
   vector [N] s_2;
   vector [N] s_new1;
   vector [N] s_new2;
for (i in 1:N){
   s_1[i]= 1- ((lambdaa [i]*(respon[i,2]+0.01)^alpha)/ (1+(lambdaa [i]*(respon[i,2]+0.01)^alpha)));
   s_2[i]= 1- ((lambdaa [i]*(respon[i,3]+0.01)^alpha)/ (1+(lambdaa [i]*(respon[i,3]+0.01)^alpha)));
   s_new1[i] = a[county[i]] * s_1[i];
   s_new2[i] = a[county[i]] * s_2[i];
   log_lik [i]= log(s_1[i] - s_2[i]) + log(a[county[i]]);
  }
  }
.```

avehtari · September 22, 2021, 7:49pm

Can you also provide the full output from the loo function for both models? Also I seethe code only for one model, so it seems you forgot to include he code for the other model.

Isa_Nazar · September 23, 2021, 10:20am

Thank you so much for your quick reply. Sorry, I forgot the competing model codes. The following results are obtained from fitting two models under the same conditions (iter=4000, warmup=2000, thin=10, N=34744, n_grid=1001, num=12 ,province=31). It should be noted that with increasing iterations, there was no change in the results, including the value of LOOIC. My purpose is here the model assessment and choosing the best fitting model on my real data. I appreciated again for your reply.

My survival model:
Computed from 200 by 34744 log-likelihood matrix
```
  Estimate    SE
```

elpd_loo -116668.3 216.2
p_loo 63.9 0.8
looic 233358.6 432.4

Monte Carlo SE of elpd_loo is 0.6.

Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 34728 100.0% 157
(0.5, 0.7] (ok)

Competing model: Spatial survival model with BYM (CAR prior + Random effect)
Computed from 200 by 34744 log-likelihood matrix
```
  Estimate    SE
```

elpd_loo -12552.8 147.8
p_loo 38.8 0.7
looic 25105.5 295.6

Monte Carlo SE of elpd_loo is 0.4.

Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 34738 100.0% 180
(0.5, 0.7] (ok) 6 0.0% 195
(0.7, 1] (bad) 0 0.0%
(1, Inf) (very bad) 0 0.0%

All Pareto k estimates are ok (k < 0.7).
See help(‘pareto-k-diagnostic’) for details.

Thank you in advance!

My stan codes used to fit the spatial survival model withBYM are as follows:


data {
  int <lower=0> province;
  int <lower=0> N;
  int <lower=0> num;
  int <lower=0> number;
  int <lower =0 , upper = province> county[N];
  matrix [N,number] respon;
  matrix[province,province] H;  // area adjacency matrix row standardized (ea row sums to 1)
  matrix[province,province] Delta;  // diagonal matrix in BYM 1991, sfstd model B1
  matrix[province,province] Ones; // diag matrix of ones for identity
  real phi; // phi = 1/maxeigval(Delta*H*Delta)
     }
transformed data {
  vector[province] zeros;  // zeros for mean of MVN specification
for (i in 1:province)
  zeros[i] = 0.0;
     }
parameters{
  vector [num] beta;
  real <lower=0> sigma;
  vector[province] w1;  // random effect (CAR)
  real<lower=1e-5> prec_sre;  // precision of CAR
  vector[province] theta;       // heterogeneous effects
     }
transformed parameters{
  vector [num] time_ratio;
  vector [N] lambdaa;
  real <lower=0> alpha = 1 / sigma;
  real <lower=0> tau_sq;
  time_ratio  = exp(beta);
  tau_sq = 1.0 / (prec_sre * prec_sre);
  lambdaa = exp ((-respon[,7:18] * beta - w1[county] - theta[county]) / sigma);
     }
model{
   vector [N] s_1;
   vector [N] s_2;
   matrix[province,province] Sigma_sre;  // covariance matrix for CAR
   target += normal_lpdf(beta | 0, 1000);
   target +=gamma_lpdf(alpha |0.001,0.001);
   prec_sre ~ gamma(0.001,0.001);    // precision of spatial re
// spatial re
   Sigma_sre = (tau_sq * inverse(Ones - phi * H)) * Delta;
   w1 ~ multi_normal(zeros, Sigma_sre);
   target += normal_lpdf(theta| 0, 1);
for (i in 1:N) {
   s_1[i]= (1- ((lambdaa [i]*(respon[i,2]+0.01)^alpha)/ (1+(lambdaa [i]*(respon[i,2]+0.01)^alpha))));
   s_2[i]= (1- ((lambdaa [i]*(respon[i,3]+0.01)^alpha)/ (1+(lambdaa [i]*(respon[i,3]+0.01)^alpha))));
   target += log(s_1[i] - s_2[i]);
    }
    }
generated quantities{
   vector [N] log_lik;
   vector [N] s_1;
   vector [N] s_2;
  for (i in 1:N){
    s_1[i]= (1- ((lambdaa [i]*(respon[i,2]+0.01)^alpha)/ (1+(lambdaa [i]*(respon[i,2]+0.01)^alpha))));
    s_2[i]= (1- ((lambdaa [i]*(respon[i,3]+0.01)^alpha)/ (1+(lambdaa [i]*(respon[i,3]+0.01)^alpha))));
    log_lik [i]= log(s_1[i] - s_2[i]);
    }
    }
.```

jackyjoy123 · September 23, 2021, 1:10pm

thanks for the awesome information.

Topic		Replies	Views
Problems with LOO-CV in Survival Modeling rstan , fitting-issues , loo	8	556	October 7, 2021
Errors when calculating PSIS-LOO CV in loo version 2.2.0 Modeling fitting-issues , loo	7	1247	January 2, 2020
LOO Model Comparison Alternative Modeling rstan , techniques , loo , cmdstanr	3	90	March 27, 2025
LOOIC diagnostics with Stanfit object Modeling loo	5	650	October 30, 2020
Interpret pareto k diagnostic Modeling rstan , fitting-issues , loo	3	1623	August 3, 2023

LOO-CV for Survival model

elpd_loo -116668.3 216.2 p_loo 63.9 0.8 looic 233358.6 432.4

elpd_loo -12552.8 147.8 p_loo 38.8 0.7 looic 25105.5 295.6

Related topics

elpd_loo -116668.3 216.2
p_loo 63.9 0.8
looic 233358.6 432.4

elpd_loo -12552.8 147.8
p_loo 38.8 0.7
looic 25105.5 295.6