Out of Sample Prediction Questions on RStanarm


Operating System: Windows
Interface Version: 2.15.3
Compiler/Toolkit: RTools

Hi I have two questions about using RStanarm for cross validation using out of sample data. I’m trying to run a hiearchical model.

  1. How does Rstanarm deal with NA’s on my data set? Does it simply skip over those rows? I wish to use the same NA dealing on the newdata for cross-validation, as it currently just rejects my data.

  2. I have a column of factors that represents the labels for hierarchies to be partially pooled. Let’s call that column “Names”. I’m not sure how to match the labels of the data with the matrix outputted by posterior_predict. The vignette simply used

colnames(ppd_pool) <- as.character(bball$player)

But that doesnt work because my column “Names” is much longer than the number of columns. Futhermore, the output I get in posterior_predict(fit_partialpool, newdata) has more columns than I have levels on “Names”. I’m generally not sure what is being outputted by posterior_predict, especially since the default names are just numbers that don’t make sense.

In my case I have 39 levels, 16 independent variables… and for some reason i have 51 columns on the matrix. I’m not sure how this happened.


NAs in the original dataset are handled according to the na.action argument, but anything other than na.omit will probably not work. NAs in the dataset you predict with will yield an error message. The newdata argument shouldn’t have NAs on the variables you are predicting with. If there are NAs on unused variables that are causing this problem, just trim the dataset to include only the necessary variables.

You have 51 rows in the data.frame you passed to newdata. The posterior_predict function returns a draws x observations matrix where draws is by default 4000. Each cell is a prediction for the corresponding observation.