Problems with LOO-CV in Survival

Dear all,
I want to compare the following two models using the loo package in the RSTAN. This post is a summary of my previous post (LOO-CV for Survival model - #3 by Isa_Nazar).

The following results are obtained from fitting two models under the same conditions (iter=4000, warmup=2000, thin=10, N=34744, n_grid=1001, num=12 ,province=31). It should be noted that with increasing iterations, there was no change in the results, including the value of LOOIC. My purpose is here the model assessment and choosing the best fitting model on my real data.

  1. My survival model:

Computed from 200 by 34744 log-likelihood matrix

  Estimate    SE
elpd_loo -116668.3 216.2
p_loo 63.9 0.8
looic 233358.6 432.4
Monte Carlo SE of elpd_loo is 0.6.
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 34728 100.0% 157
(0.5, 0.7] (ok)
  1. Competing model: Spatial survival model with BYM (CAR prior + Random effect)

Computed from 200 by 34744 log-likelihood matrix

  Estimate    SE
elpd_loo -12552.8 147.8
p_loo 38.8 0.7
looic 25105.5 295.6
Monte Carlo SE of elpd_loo is 0.4.
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 34738 100.0% 180
(0.5, 0.7] (ok) 6 0.0% 195
(0.7, 1] (bad) 0 0.0%
(1, Inf) (very bad) 0 0.0%
All Pareto k estimates are ok (k < 0.7).
See help(‘pareto-k-diagnostic’) for details.

The only difference between the two models is log(a[county[i]]), which has negative values, but there is a big difference between the loo values of the two models, and I was really confused. My question is whether I calculated the loo values correctly or not? Can anyone help me with this problem?
Any suggestions would be greatly appreciated!
Thank you

My stan codes used to fit the two spatial survival models are as follows:

1:My survival model:

data {
  int <lower=0> province;
  int <lower=0> n_grid;
  int <lower=0> N;
  int <lower=0> num;
  int <lower=0> number;
  int <lower =0 , upper = province> county[N];
  vector <lower=0> [province] p;
  row_vector[n_grid] kernel[province];
  matrix [N,number] respon;
}
parameters{
  vector [num] beta;
  vector <lower=0> [province] sigma1;
  real <lower=0> sigma;
  vector [n_grid] x;
  vector <lower=0> [province] r_county; 
     }
transformed parameters{
  vector [num] time_ratio;
  vector [N] lambdaa;
  real <lower=0> alpha = 1 / sigma;
  vector [n_grid] exp_x;
  vector[province] a; // `a` values between 0 and 1
  time_ratio  = exp(beta);
  exp_x = exp(x);
  lambdaa = exp (((-1 * respon[,7:18] * beta) / sigma)+log(r_county [county]));
  // }
     { // `z` not part of output
  vector[province] landa;
  for (k in 1 : province) {
        landa[k] = kernel[k] * exp_x * p[k];
     }
  a = landa / sum(landa); // assign `a` between 0 and 1
     }
     }
model{
   vector [N] s_1;
   vector [N] s_2;
   vector [N] s_new1;
   vector [N] s_new2;
   target += normal_lpdf(x|0,5);
   target += normal_lpdf(beta | 0, 1000);
   target += gamma_lpdf(alpha | 0.001, 0.001);
   target += normal_lpdf(r_county| 0, sigma1);
   target += student_t_lpdf(sigma1| 3,0,1);
for (i in 1:N) {
   s_1[i]= 1- ((lambdaa [i]*(respon[i,2]+0.01)^alpha)/ (1+(lambdaa [i]*(respon[i,2]+0.01)^alpha)));
   s_2[i]= 1- ((lambdaa [i]*(respon[i,3]+0.01)^alpha)/ (1+(lambdaa [i]*(respon[i,3]+0.01)^alpha)));
   s_new1[i] = a[county[i]] * s_1[i];
   s_new2[i] = a[county[i]] * s_2[i];
   target += log(s_1[i] - s_2[i]) + log(a[county[i]]);
   }
   }
generated quantities{
   vector [N] log_lik;
   vector [N] s_1;
   vector [N] s_2;
   vector [N] s_new1;
   vector [N] s_new2;
for (i in 1:N){
   s_1[i]= 1- ((lambdaa [i]*(respon[i,2]+0.01)^alpha)/ (1+(lambdaa [i]*(respon[i,2]+0.01)^alpha)));
   s_2[i]= 1- ((lambdaa [i]*(respon[i,3]+0.01)^alpha)/ (1+(lambdaa [i]*(respon[i,3]+0.01)^alpha)));
   s_new1[i] = a[county[i]] * s_1[i];
   s_new2[i] = a[county[i]] * s_2[i];
   log_lik [i]= log(s_1[i] - s_2[i]) + log(a[county[i]]);
  }
  }
  1. Competing model: Spatial survival model with BYM (CAR prior + Random effect)
data {
  int <lower=0> province;
  int <lower=0> N;
  int <lower=0> num;
  int <lower=0> number;
  int <lower =0 , upper = province> county[N];
  matrix [N,number] respon;
  matrix[province,province] H;  // area adjacency matrix row standardized (ea row sums to 1)
  matrix[province,province] Delta;  // diagonal matrix in BYM 1991, sfstd model B1
  matrix[province,province] Ones; // diag matrix of ones for identity
  real phi; // phi = 1/maxeigval(Delta*H*Delta)
     }
transformed data {
  vector[province] zeros;  // zeros for mean of MVN specification
for (i in 1:province)
  zeros[i] = 0.0;
     }
parameters{
  vector [num] beta;
  real <lower=0> sigma;
  vector[province] w1;  // random effect (CAR)
  real<lower=1e-5> prec_sre;  // precision of CAR
  vector[province] theta;       // heterogeneous effects
     }
transformed parameters{
  vector [num] time_ratio;
  vector [N] lambdaa;
  real <lower=0> alpha = 1 / sigma;
  real <lower=0> tau_sq;
  time_ratio  = exp(beta);
  tau_sq = 1.0 / (prec_sre * prec_sre);
  lambdaa = exp ((-respon[,7:18] * beta - w1[county] - theta[county]) / sigma);
     }
model{
   vector [N] s_1;
   vector [N] s_2;
   matrix[province,province] Sigma_sre;  // covariance matrix for CAR
   target += normal_lpdf(beta | 0, 1000);
   target +=gamma_lpdf(alpha |0.001,0.001);
   prec_sre ~ gamma(0.001,0.001);    // precision of spatial re
// spatial re
   Sigma_sre = (tau_sq * inverse(Ones - phi * H)) * Delta;
   w1 ~ multi_normal(zeros, Sigma_sre);
   target += normal_lpdf(theta| 0, 1);
for (i in 1:N) {
   s_1[i]= (1- ((lambdaa [i]*(respon[i,2]+0.01)^alpha)/ (1+(lambdaa [i]*(respon[i,2]+0.01)^alpha))));
   s_2[i]= (1- ((lambdaa [i]*(respon[i,3]+0.01)^alpha)/ (1+(lambdaa [i]*(respon[i,3]+0.01)^alpha))));
   target += log(s_1[i] - s_2[i]);
    }
    }
generated quantities{
   vector [N] log_lik;
   vector [N] s_1;
   vector [N] s_2;
  for (i in 1:N){
    s_1[i]= (1- ((lambdaa [i]*(respon[i,2]+0.01)^alpha)/ (1+(lambdaa [i]*(respon[i,2]+0.01)^alpha))));
    s_2[i]= (1- ((lambdaa [i]*(respon[i,3]+0.01)^alpha)/ (1+(lambdaa [i]*(respon[i,3]+0.01)^alpha))));
    log_lik [i]= log(s_1[i] - s_2[i]);
    }
    }
.```

Hi,
the models are a bit too complex for me to understand quickly, so I’ll presume there are no bugs (you should definitely check, e.g. by testing if you can recover parameters from simulated data). I’ll also note that my understanding of LOO is relatively surface-level, but since nobody else answered, I will give it a try.

One way to try to understand why LOO values are very different is to note that LOO works with the posterior predictive distribution. Large differences in LOO-CV should thus usually correspond to different posterior predictions for new data. Do the predicitons of the models (for some new unseen data point) differ substantially? If not, but you get large differences in LOO-CV, I would suspect a bug somewhere.

I really don’t understand what log(a[county[i]]) is doing in the likelihood, but maybe the problem really lies there - maybe the models differ in what parameters you are conditioning on when computing the log likelihood? But that’s just a very wild guess…

Best of luck with your model!

1 Like

Thanks a ton for your reply and help. Here, the only difference between the likelihood of two models is log(a[county[i]]) that it is added in the log-likelihood function to account for the spatial dependency.
Is it possible to get different results by comparing models using two different GOF criteria, LOO & DIC? I mean, a model is selected in terms of LOO values, but its DIC value is more.

Thank you in advance!

Are you sure this is achieving what you intend? In your transformed parameters block you note in a comment that a is constrained to take values between 0 and 1, which means that its logarithm is guaranteed to be negative. If this is truly the only difference between the likelihoods, then it doesn’t surprise me that LOO would dramatically prefer the model that doesn’t have a giant negative term tacked onto the likelihood.

It’s certainly possible (otherwise we wouldn’t need all the awesome work that’s gone into LOO and we could all just use DIC!). But if it’s true that the only difference in the likelihoods here is a big negative term, then I don’t think there’s any way that DIC will give you different results here.

Thank you so much for your quick reply and assistance. Yes, these are probability values that have been estimated and their logarithm is negative and the only difference between the two models is the same. Do you have any suggestions to compare the two models in RSTAN?

Thank you

Apart from the log(a[county[i]]) values, the covariates were linked differently to the scale parameter in the two models as below:

1:My survival model:
lambdaa = exp (((-1 * respon[,7:18] * beta) / sigma)+log(r_county [county]));

  1. Competing model: Spatial survival model with BYM (CAR prior + Random effect)
    lambdaa = exp ((-respon[,7:18] * beta - w1[county] - theta[county]) / sigma);

Here, w1 is a spatial random effect with Car prior, and theta is an ordinary random effect.

Thank you

1 Like

By a quick look I didn’t see anything wrong. As p_loo for both models is quite small compared to the number of observations, you don’t need LOO to see the difference between the models. It would be good to look at the actual predictions and some expected loss in scale that is easier for you to interpret.

Thanks a ton, Prof. Vehtari for your reply and clarification of my confusion. I think I need to change my pointwise log-likelihood in the competing model as follows because it’s just the likelihood of the data (conditional on the parameters, beta, sigma,sigma1, and r_county). any advice and suggestions will be greatly appreciated.
‘’’
generated quantities{
vector [N] log_lik;
vector [N] s_1;
vector [N] s_2;
// vector [N] s_new1;
//vector [N] s_new2;
for (i in 1:N){
s_1[i]= 1- ((lambdaa [i] (respon[i,2]+0.01)^alpha)/ (1+(lambdaa [i] (respon[i,2]+0.01)^alpha)));
s_2[i]= 1- ((lambdaa [i] (respon[i,3]+0.01)^alpha)/ (1+(lambdaa [i] (respon[i,3]+0.01)^alpha)));
//s_new1[i] = a[county[i]] * s_1[i];
// s_new2[i] = a[county[i]] * s_2[i];
log_lik [i]= log(s_1[i] - s_2[i]) ;
}
}
‘’’.

Thank you in advance!

Warm Regards,
Eisa