Hi, these questions were originally directed to @avehtari , and he asked me to post to the community at large here so that the answers could be of benefit to multiple people.
Currently, I am in the review process for a manuscript summarizing results for a project I worked on that used Stan (preprint here: https://www.biogeosciences-discuss.net/bg-2020-23/bg-2020-23.pdf). I had some questions I was looking to clarify that were related to the reviewer queries:
-
I had mentioned and cited the demonstrated stability and reliability issues with AIC and DIC in the manuscript. However, a reviewer indicated wanting to see AIC and DIC compared alongside the LOO and WAIC results. I will mention in my response that I will be hard-pressed to calculate AIC from Stan output, as it requires a pointwise maximum likelihood estimate that does not work with non-uniform priors in a Bayesian setting. With respect to DIC, I was going to say that including DIC for comparison was redundant in the presence of WAIC because DIC is calculated from pointwise posterior means and could be thought of as an approximation of WAIC, with WAIC in turn being an approximation of LOO. Regarding DIC, is that something I can justifiably assert?
-
I am trying to develop a better intuition for the PSIS-LOO methodology to be in a position to answer more detailed questions about its computation. From the 2017 Vehtari et al. paper, based on my understanding of importance sampling, IS is necessary to obtain p(\theta | y_{-i}). For each \theta, we can estimate p(y_i | \theta) from the posterior. IS then allows us to obtain p(\theta | y_{-i}) from the posterior (do correct me if Iâm wrong in anything Iâve said prior to this). I just wanted a hint on the math steps that lead from Equation (7) in that 2017 paper to Equation (8) to get the LOO lpd corresponding to held out point y_i (equations on page 3 here). And then, for a laymanâs summary of the function of IS here, IS helps us estimate the distribution of unobserved data points conditional on the holdout data point?
-
Is the log pseudomarginal likelihood (LPML) that appears in Christensen et al. 2011, âBayesian Ideas and Data Analysis,â the same as LOO without overfitting/effective parameter count penalty? (a formula for LPML is also available here â https://www4.stat.ncsu.edu/~reich/st740/ModelComparison2.pdf)
-
Here is a question that is unrelated to the ones above. I set my adapt delta and step size to 0.9995 and 0.001 respectively in order to mitigate divergent transitions I had. One ODE model I used has some parameterization issues that will be refined in future projects, and this definitely contributed to the divergent transitions. Increasing my adapt delta and reducing my step size HMC parameters did not entirely get rid of my divergent transitions, but did reduce their overall number per run. I set my adapt delta and step size accordingly based on some older user reports in the deprecated Stan Google Group. One reviewer critiqued my adapt delta and step size parameter settings as being extreme (potentially would limit amount of parameter space being explored) and asked for a literature citation that supported modifying adapt delta and step size parameters to reduce divergent transitions. If I have enough posterior samples, would my adapt delta and step size values still be unforgivably problematic? Also, I did not find anything after some days of Googling, but perhaps I was not using the right keywords â are there any papers that discusses adapt delta and step size with respect to divergent transitions? I thought some of @betanalphaâs writings might, but did not initially find anything that more formally would allow me to justify that use of a high adapt delta and low step size would not inhibit parameter space exploration too much.
Thank you all for your help