Hello,
I recently read the arXiv paper https://arxiv.org/abs/2002.09633 about Bayesian survival models in rstanarm.
I am interested in applying approximate leave-one-out cross-validation based on the loo_approximate_posterior
function of the loo
package to some of the survival models available in rstanarm
.
Based on the papers I read, the approximate LOO method of the loo_approximate_posterior
function speeds up the computation of LOO by approximating the posterior (e.g. by using Laplace, meanfield or fullrank approximations), and correcting the importance weights for using such a posterior approximation. The importance weights are adapted so that only the full posterior needs to be computed once (e.g. by MCMC) to obtain the leave-one-out posteriors required for the computation of LOO. Second, probability-proportional-to-size subsampling is used to use only a subset of LOO posteriors instead of all LOO posteriors, which speeds up the computation further, in particular in datasets with large sample size n
.
So far, I followed the LOO vignette about large data at https://cran.r-project.org/web/packages/loo/vignettes/loo2-large-data.html. In the vignette, an example is given where a Laplace posterior approximation is used:
# Approximate LOO-CV using PSIS-LOO with posterior approximations
fit_laplace <- optimizing(stan_mod, data = standata, draws = 2000,
importance_resampling = TRUE)
parameter_draws_laplace <- fit_laplace$theta_tilde # draws from approximate
posterior
log_p <- fit_laplace$log_p # log density of the posterior
log_g <- fit_laplace$log_g # log density of the approximation
set.seed(4711)
loo_ap_ss_1 <-
loo_subsample(
x = llfun_logistic,
draws = parameter_draws_laplace,
data = stan_df_1,
log_p = log_p,
log_g = log_g,
observations = 100
)
print(loo_ap_ss_1)
Now, to benefit from the approximation of the LOO posteriors by the full posterior, the parameters log_p
and log_g
need to be specified. Also, the log-likelihood function llfun_logistic
needs to be specified.
My question now is the following:
- The vignette uses
rstan
. The above survival models on the other hand are fit viarstanarm
. While it theoretically would be possible to extract the raw Stan code from therstanarm
models to manually create the log-likelihood functions, this is quite tedious (if possible at all). Is it possible to use somehow extract the log-likelihood function (not matrix) from therstanarm
models to subsequently pass it to theloo_subsample
function? - The vignette uses a Laplace approximation via the
optimizing
function to approximate the posterior distribution. Thevb
function provides fullrank and meanfield algorithms, too. To use the meanfield or fullrank approximate posterior in theloo_subsample
function, I would need to extract thelog_p
andlog_g
parameters. Is there a way to do this? Based on the documentation I could not find anything, while theoptimizing
function easily provides both.
Thanks in advance,
Riko