What does `prior_PD` sample?

mattansb · October 14, 2019, 1:56pm

In the documentation of stanreg functions, prior_PD is defined as:

A logical scalar (defaulting to FALSE) indicating whether to draw from the prior predictive distribution instead of conditioning on the outcome.

However, the documentation of posterior_vs_prior describes it as:

Plot medians and central intervals comparing parameter draws from the prior and posterior distributions. …

Internally, it seems that posterior_vs_prior simply uses update(object, prior_PD = TRUE), so I am not sure if the prior distribution and the prior predictive distribution are one and the same?

They seem to be:

library(rstanarm)
#> Loading required package: Rcpp
#> Warning: package 'Rcpp' was built under R version 3.6.1
#> Registered S3 method overwritten by 'xts':
#>   method     from
#>   as.zoo.xts zoo
#> rstanarm (Version 2.18.2, packaged: 2018-11-08 22:19:38 UTC)
#> - Do not expect the default priors to remain the same in future rstanarm versions.
#> Thus, R scripts should specify priors explicitly, even if they are just the defaults.
#> - For execution on a local, multicore CPU with excess RAM we recommend calling
#> options(mc.cores = parallel::detectCores())
#> - Plotting theme set to bayesplot::theme_default().
library(bayestestR)
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 3.6.1
fit <- stan_lmer(extra ~ group + (1 | ID),
                 data = sleep,
                 prior_PD = TRUE,
                 refresh = 0)
prior_summary(fit)
#> Priors for model 'fit' 
#> ------
#> Intercept (after predictors centered)
#>  ~ normal(location = 0, scale = 10)
#>      **adjusted scale = 20.18
#> 
#> Coefficients
#>  ~ normal(location = 0, scale = 2.5)
#>      **adjusted scale = 5.04
#> 
#> Auxiliary (sigma)
#>  ~ exponential(rate = 1)
#>      **adjusted scale = 2.02 (adjusted rate = 1/adjusted scale)
#> 
#> Covariance
#>  ~ decov(reg. = 1, conc. = 1, shape = 1, scale = 1)
#> ------
#> See help('prior_summary.stanreg') for more details
x <- insight::get_parameters(fit)
ggplot(x, aes(group2)) + 
  geom_density() +
  stat_function(fun = function(x) dnorm(x,0, 5.044799), color = "red")

^{Created on 2019-10-14 by the reprex package (v0.3.0)}

Running Win10
rstanarm 2.18.2

bgoodri · October 14, 2019, 3:19pm

If prior_PD is TRUE, then it draws from the prior distribution, primarily in order to use those realizations to draw from the prior predictive distribution and verify that the priors are reasonable. The Stan programs in rstanarm actually do draw from the predictive distribution in the generated quantities block, but it wasn’t a great choice of argument name.

mattansb · October 14, 2019, 3:51pm

My take home is: when prior_PD = TRUE, the produced draws are from the prior distribution (and not from the prior predictive distribution).

Thanks Ben!

bgoodri · October 14, 2019, 5:22pm

Yeah, except for the column in the output labeled PPD.

jonah · October 16, 2019, 2:49pm

I agree the name prior_PD wasn’t the best choice. We can always deprecate it and change the name without breaking backwards compatibility. Any ideas for a better one? I forget what brms uses but we could just change to that name for consistency.

mattansb · October 16, 2019, 2:54pm

brms uses sample_prior = c("no", "yes", "only").

gchesterton · May 30, 2024, 10:42pm

Hi all who were on this thread. My question is related, but if it is worthy of a new thread I’ll go that route.
I am modeling penetration (yes/no) ~ velocity (fps). My response variable is binary, and the velocities range from roughly 1000 fps to 3000 fps. So I’m using a logistic regression, with priors for alpha and beta that are normal(mu, sigma) informed by general knowledge.
I set prior_PD=TRUE to see a plot of the prior predictions. I expected the plot to be completely unaffected by my data set. Yet I notice that the range on the x-axis matches that of my data set. FWIW, since I specified my priors, then autoscaling is turned off. So I’m confused in general about why the plot of the predictions based on priors would reference my data at all.

mattansb · May 31, 2024, 6:40am

Your priors indicate how X is related to Y - specifically, how Y is conditionally distributed wrt X.
Accordingly, in order to generate a prior predictive distribution, which are conditional on X, we need to plug-in values of X. Those are taken from the data. You can plug-in any values of X, if you wish - but you need to provide something.

gchesterton · May 31, 2024, 12:29pm

Thanks Mattan! That makes sense. How do I provide a different set of X? I called stan_glm to produce the model, where my scaled glass_data X values (velocity100) range from approx 10 to 27 (in hundreds of feet per sec):

penetration_model_prior ← stan_glm(Result ~ velocity100, data = glass_data, family = binomial, prior_intercept = normal(-5.7, 1.5), prior = normal(0.27, 0.06), chains = 4, iter = 5000*2, seed = 84735, prior_PD = TRUE)

Then I plotted the prior predicteds with epred draws:

glass_data %>%
add_epred_draws(penetration_model_prior, ndraws = 100) %>%
ggplot(aes(x = Velocity100, y = Result)) + geom_line(aes(y = .epred, group = .draw), linewidth =.5, alpha=0.3)

That’s when I end up with plots that are truncated to the range of X in my observed data.
Later, I call update() with prior_PD=FALSE to see the posterior.

I thought the value of referencing my dataset in the initial stan_glm () call was so that I could later reference that model in a call to update () . Perhaps I’m going about it wrong and I need to go back to some tutorials.

jsocolar · May 31, 2024, 2:34pm

add_epred_draws takes a newdata argument as its first argument. Instead of providing glass_data, provide some other dataset for the locations where you want the predictions.

gchesterton · May 31, 2024, 4:00pm

Thanks that worked!

Topic		Replies	Views
Posterior_vs_prior function for stanfit object RStan bayesplot	3	1405	February 1, 2019
How to get a summary of the prior distributions from a fitted Stan model Interfaces rstan	8	1612	January 30, 2020
Rstanarm prior specification: stan_glm.nb() and stan_glm() Poisson rstanarm	6	1218	June 21, 2017
R package based on rstan -- user-defined prior distributions General rstan , rstanarm	0	405	May 2, 2023
Problem with predicting Posterior Predictive Distributions Modeling rstan , techniques	2	1131	October 22, 2021

What does `prior_PD` sample?

Related topics