Rstanarm pp_validate bug?

I have estimated a stan_lm model.

I use RStudio.

Model Info:

function: stan_lm
family: gaussian [identity]
formula: log(StandardizedOME) ~ Age + AnesthesiaDuration + AnesthesiaTechniqueBlock + AnesthesiaTechniqueGeneral + AnesthesiaTechniqueNeuraxial + o.ASAClass + EmergencyStatusYN + t.Race + Sex + REMI + o.NonOpioidAnalgesicsCount + o.AIM1Year + CPTBucket + MPOGInstitutionID
algorithm: sampling
priors: see help(‘prior_summary’)
sample: 16000 (posterior sample size)
observations: 1104358

Priors for model ‘StandardizedOME.stan_lm.3’

Intercept (after predictors centered)
~ flat

~ R2(location = 0.3, what = ‘mode’)

I run with iter = 8000 to reach n_eff > 1000 for the log_posterior.

The mixing traces seem ok.

The pp_check plot is reasonable.

I called the pp_validate function for this model.

I use the default arguments for nreps and seed.

An error message appears within a minute.

“RStudio RSession has stopped working”

I must close and reboot the RStudio session.

This is reproducible (5 times).

I am still new at doing stanarm.

Any suggestions?


Please also provide the following information in addition to your question:

  • Operating System: Windows
  • rstanarm Version: rstan_2.18.2

R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server >= 2012 x64 (build 9200)

Matrix products: default

[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] compiler_3.5.3 tools_3.5.3 yaml_2.2.0

I think you ran out of RAM. With over a million observations and 16000 posterior draws that can happen. I don’t think you need so many draws. But the pp_validate function does not do Simulation Based Callibration correctly anyway, so I would not worry about it.

Thanks for the comment about pp_validate.

Concerning the number of draws, I was following the recommendation to keep n_eff > 1000.

Is that rule-of-thumb more flexible?


I don’t think you should be that concerned with the effective sample size of lp__ if that is what you are asking. The current recommendation is to have at least 100 effective samples per chain using the bulk ESS and tail ESS functions (which are not yet available in the rstan you are using).