LOOIC diagnostics with Stanfit object

I saw that the loo v.2.0.0 has built-in ways to proceed when loo gives warnings about high k values beyond k_threshold, however this loo version only applies to a Stanreg object. I used the rstan package to write up my own model, which produces a stanfit object. Currently, I do the following to obtain loo:

log_lik <- rstan::extract(fit,"log_lik_marg") # fit: a stanfit object; log_lik_marg: the marginal log likelihood
looic <- loo::loo(log_lik)
loo(log_lik, save_psis=TRUE, k_treshold=0.7)

The results look like this:

Computed from 1500 by 48 log-likelihood matrix

         Estimate   SE
elpd_loo   -534.1 44.1
p_loo        32.4  7.4
looic      1068.1 88.2
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     40    83.3%   251       
 (0.5, 0.7]   (ok)        6    12.5%   61        
   (0.7, 1]   (bad)       1     2.1%   67        
   (1, Inf)   (very bad)  1     2.1%   6         
See help('pareto-k-diagnostic') for details.

I could also use loo v.2.0.0 by using loo(fit), which however gives completely different (worse) results:

Computed from 1500 by 48 log-likelihood matrix

         Estimate   SE
elpd_loo   -434.1 37.5
p_loo       123.7  9.9
looic       868.2 75.0
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)      0     0.0%   <NA>      
 (0.5, 0.7]   (ok)        3     6.2%   43        
   (0.7, 1]   (bad)      25    52.1%   10        
   (1, Inf)   (very bad) 20    41.7%   2         
See help('pareto-k-diagnostic') for details.

Interestingly, I found that loo(fit) produces similar results if I extract the conditional log likelihood.

log_lik_con <- rstan::extract(fit,"log_lik")[[1]] # similar to loo(fit) results
loo(log_lik_con)

Computed from 1500 by 48 log-likelihood matrix

         Estimate   SE
elpd_loo   -433.9 37.5
p_loo       123.6  9.9
looic       867.9 75.0
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)      0     0.0%   <NA>      
 (0.5, 0.7]   (ok)        3     6.2%   80        
   (0.7, 1]   (bad)      25    52.1%   18        
   (1, Inf)   (very bad) 20    41.7%   2         
See help('pareto-k-diagnostic') for details.

My first questions is that whether anyone have any insight on why the results look so? And which result should I trust? My second question is how I can call loo to refit the model J times when there are J observations with high k estimates. I tried loo(log_lik, save_psis=TRUE, k_treshold=0.7) or loo(fit, save_psis = TRUE, k_threshold = 0.7), but nothing happened.

Below are the details of computing marginal & conditional log likelihood:

log_lik_marg[n] = multi_normal_lpdf(Y[subject_starts[n]: subject_stops[n]] | zero_v[subject_starts[n]: subject_stops[n]], 
                                        to_matrix(Sigma[cov_starts[n]: cov_stops[n]], V[n], V[n]));
log_lik[n] = multi_normal_lpdf(Y[subject_starts[n]: subject_stops[n]] | B[subject_starts[n]: subject_stops[n]]*theta_mu + 
                          B[subject_starts[n]: subject_stops[n]] * Theta * alpha[n], diag_matrix(rep_vector(sigma_eps^2,V[n])));

Thanks!

Your first uses log_lik_marg (plus throws away the Markov chain dependency information), but the second code use log_lik and the end you tell that these are different quantities, so that’s a likely reason.

This works only for rstanarm and brms as they know enough about the structure of the model. rstan doesn’t have enough information about how to make the needed additional computations.

Thank you for your reply! Regarding your first answer, should I trust the LOOIC results from marginal log likelihood or conditional log likelihood (the latter gave the same results as from loo v.2.0.0 with loo(fit))?
I mostly see people use marginal log likelihood for LOOIC computation.

This is the first time I see log_lik_marg used. In general whether marginal or conditional is appropriate on the context. I would need to investigate what marginal this is here. If you post the full code and explain what are Y, subjects_starts, etc. I may have time to answer next week.

2 Likes

Thank you so much! As Y records the longitudinal measurements
for all subjects and each subject has different number of time points, so I need to use indexes such as subject_starts to find the measurements corresponding to each subject. Below are the stan code:

data {
  int <lower=1> N; // number of total unique subjects
  int <lower=1> K; // number of principal components
  int <lower=1> Q; // number of basis for the cubic spline basis 
  int <lower=1> cov_size; // number of elements for the covariance matrix

  int <lower=1> V[N]; // number of total visits for each subject
  int <lower=1> subject_starts[N]; // starting index for each subjects' visit
  int <lower=1> subject_stops[N]; // ending index for each subjects' visit
  int <lower=1> cov_starts[N]; // starting index for each subjects' covariance matrix
  int <lower=1> cov_stops[N]; // ending index for each subjects' covariance matrix

  vector[sum(V)] Y; // record response at each available time point for each subject 
  matrix[sum(V), Q] B; // transpose of sparse spline basis 
}
transformed data{
  vector[K] zero_k;
  vector[Q] zero_q;
  vector[sum(V)] zero_v;
  zero_k = rep_vector(0,K);
  zero_q = rep_vector(0,Q);
  zero_v = rep_vector(0, sum(V)); 
}
parameters { 
  vector[K] alpha[N];
  matrix[Q,K] Theta;
  real<lower=0> sigma_eps;
  vector[Q] theta_mu; 
}
model {
  sigma_eps ~ cauchy(0, 1); 
  theta_mu ~ normal(0, 1);

  for(k in 1:K){
    Theta[, k] ~ normal(0, 1);
  }

  // posterior draws
  for(n in 1:N){
    alpha[n] ~ multi_normal(zero_k, diag_matrix(rep_vector(1,K)));
    Y[subject_starts[n]: subject_stops[n]] ~ multi_normal(B[subject_starts[n]: subject_stops[n]]*theta_mu + 
                      B[subject_starts[n]: subject_stops[n]] * Theta * alpha[n], diag_matrix(rep_vector(sigma_eps^2, V[n])));
  }
} 
generated quantities{
  vector[cov_size] Sigma;
  matrix[Q, Q] W;
  vector[N] log_lik_marg;
  vector[N] log_lik;
  vector[sum(V)] Ynew;

  W = Theta * Theta';
  for (n in 1:N){
    Sigma[cov_starts[n]: cov_stops[n]] = to_vector(B[subject_starts[n]: subject_stops[n]] * Theta * Theta' * 
                                         B[subject_starts[n]: subject_stops[n]]' + diag_matrix(rep_vector(sigma_eps^2,V[n])));
    log_lik_marg[n] = multi_normal_lpdf(Y[subject_starts[n]: subject_stops[n]] | zero_v[subject_starts[n]: subject_stops[n]], 
                                        to_matrix(Sigma[cov_starts[n]: cov_stops[n]], V[n], V[n]));
    log_lik[n] = multi_normal_lpdf(Y[subject_starts[n]: subject_stops[n]] | B[subject_starts[n]: subject_stops[n]]*theta_mu + 
                          B[subject_starts[n]: subject_stops[n]] * Theta * alpha[n], diag_matrix(rep_vector(sigma_eps^2,V[n])));
    Ynew[subject_starts[n]: subject_stops[n]] = multi_normal_rng(B[subject_starts[n]: subject_stops[n]]*theta_mu + 
                      B[subject_starts[n]: subject_stops[n]] * Theta * alpha[n], diag_matrix(rep_vector(sigma_eps^2, V[n])));
  }
}

Can you also clarify what your are modeling and what is the difference between log_li and log_lik_marg? I can see the code, but it’s good if you can add a bit of explanation.