In the documentation of stanreg functions, prior_PD is defined as:
A logical scalar (defaulting to FALSE) indicating whether to draw from the prior predictive distribution instead of conditioning on the outcome.
However, the documentation of posterior_vs_prior describes it as:
Plot medians and central intervals comparing parameter draws from the prior and posterior distributions. …
Internally, it seems that posterior_vs_prior simply uses update(object, prior_PD = TRUE), so I am not sure if the prior distribution and the prior predictive distribution are one and the same?
They seem to be:
library(rstanarm)
#> Loading required package: Rcpp
#> Warning: package 'Rcpp' was built under R version 3.6.1
#> Registered S3 method overwritten by 'xts':
#> method from
#> as.zoo.xts zoo
#> rstanarm (Version 2.18.2, packaged: 2018-11-08 22:19:38 UTC)
#> - Do not expect the default priors to remain the same in future rstanarm versions.
#> Thus, R scripts should specify priors explicitly, even if they are just the defaults.
#> - For execution on a local, multicore CPU with excess RAM we recommend calling
#> options(mc.cores = parallel::detectCores())
#> - Plotting theme set to bayesplot::theme_default().
library(bayestestR)
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 3.6.1
fit <- stan_lmer(extra ~ group + (1 | ID),
data = sleep,
prior_PD = TRUE,
refresh = 0)
prior_summary(fit)
#> Priors for model 'fit'
#> ------
#> Intercept (after predictors centered)
#> ~ normal(location = 0, scale = 10)
#> **adjusted scale = 20.18
#>
#> Coefficients
#> ~ normal(location = 0, scale = 2.5)
#> **adjusted scale = 5.04
#>
#> Auxiliary (sigma)
#> ~ exponential(rate = 1)
#> **adjusted scale = 2.02 (adjusted rate = 1/adjusted scale)
#>
#> Covariance
#> ~ decov(reg. = 1, conc. = 1, shape = 1, scale = 1)
#> ------
#> See help('prior_summary.stanreg') for more details
x <- insight::get_parameters(fit)
ggplot(x, aes(group2)) +
geom_density() +
stat_function(fun = function(x) dnorm(x,0, 5.044799), color = "red")
If prior_PD is TRUE, then it draws from the prior distribution, primarily in order to use those realizations to draw from the prior predictive distribution and verify that the priors are reasonable. The Stan programs in rstanarm actually do draw from the predictive distribution in the generated quantities block, but it wasn’t a great choice of argument name.
I agree the name prior_PD wasn’t the best choice. We can always deprecate it and change the name without breaking backwards compatibility. Any ideas for a better one? I forget what brms uses but we could just change to that name for consistency.
Hi all who were on this thread. My question is related, but if it is worthy of a new thread I’ll go that route.
I am modeling penetration (yes/no) ~ velocity (fps). My response variable is binary, and the velocities range from roughly 1000 fps to 3000 fps. So I’m using a logistic regression, with priors for alpha and beta that are normal(mu, sigma) informed by general knowledge.
I set prior_PD=TRUE to see a plot of the prior predictions. I expected the plot to be completely unaffected by my data set. Yet I notice that the range on the x-axis matches that of my data set. FWIW, since I specified my priors, then autoscaling is turned off. So I’m confused in general about why the plot of the predictions based on priors would reference my data at all.
Your priors indicate how X is related to Y - specifically, how Y is conditionally distributed wrt X.
Accordingly, in order to generate a prior predictive distribution, which are conditional on X, we need to plug-in values of X. Those are taken from the data. You can plug-in any values of X, if you wish - but you need to provide something.
Thanks Mattan! That makes sense. How do I provide a different set of X? I called stan_glm to produce the model, where my scaled glass_data X values (velocity100) range from approx 10 to 27 (in hundreds of feet per sec):
That’s when I end up with plots that are truncated to the range of X in my observed data.
Later, I call update() with prior_PD=FALSE to see the posterior.
I thought the value of referencing my dataset in the initial stan_glm () call was so that I could later reference that model in a call to update () . Perhaps I’m going about it wrong and I need to go back to some tutorials.
add_epred_draws takes a newdata argument as its first argument. Instead of providing glass_data, provide some other dataset for the locations where you want the predictions.