Prior predictive checks for multiple regression

swalmsley · October 5, 2023, 2:50pm

Apologies in advance if this is the wrong place to be posting this! I’m trying to run some prior predictive checks for the linear predictor portion of a basic multiple regression. With simple regression (one predictor) of the following form:

Y \sim N(\mu, \sigma) \\mu= \alpha + \beta * X

I understand how to sample values of alpha and beta from the priors to fit various values of mu as a function of a sensible range of X values.

I’m unsure of how this would work with a multiple regression of the form:

Y \sim N(\mu, \sigma) \\ mu=\alpha + \beta_1 * X1+ \beta_2 * X2

Would it make sense to chunk out values of (alpha and beta_1) and (alpha and beta_2) to create two plots of possible lines against possible X values, one for each predictor? Or would this no longer make sense given the implied interdependencies between all three coefficients, in which case I should be examining the distribution of computed mu values or some other quantity?

Thank you!

Bob_Carpenter · October 10, 2023, 4:32pm

Thanks for posting. This is the right forum, though I moved it to category Modeling. Sorry nobody’s responded yet, as this one’s easy to answer.

Prior predictive checks are the same in the single regression and multiple regression case. For a regression, we have

Y_n \sim \textrm{normal}(\alpha + x_n \cdot \beta, \sigma),

where the outcome Y_n \in \mathbb{R} is treated as a random variable and x_n \in \mathbb{R}^K is a row vector treated as a constant and \beta \in \mathbb{R}^K is a vector of coefficients, \alpha \in \mathbb{R} is the intercept, and \sigma \in (0, \infty) is the error scale. Then we also need a prior

\alpha, \beta, \sigma \sim p(\alpha, \beta, \sigma).

To do a prior predictive check, we generate M data sets, one for each m \in 1:M. To generate data set m, simulate the parameters from the prior

\alpha^{(m)}, \beta^{(m)} , \sigma^{(m)} \sim p(\alpha, \beta, \sigma)

and then simulate data from the parameters

y_n^{(m)} \sim \textrm{normal}(\alpha^{(m)} + x_n \cdot \beta^{(m)}, \sigma^{(m)}) for n \in 1:N

Now you have a simulated data set y^{(m)} for each m \in 1:M. Typically we look at statistics of the y^{(m)} and compare to our actual data y.

So when you do prior predictive checks, the x_n do not change. So it doesn’t matter what the dimensionality is.

There will be correlation among the parameters in the posterior. You haven’t said what your prior is, but often priors are independent in the parameters.

jd_c · October 10, 2023, 4:58pm

Sorry, I actually read this several days ago and then forgot to reply! @Bob_Carpenter answer is excellent, and the practical implementation of it is easy in Stan or brms. In Stan, you can write your full Stan code (with predictions generated in the generated quantities block) and then wrap the likelihood statement in something like if(prior_only==0) { ....... } with prior_only passed as data to Stan and defined in the data block. When you pass prior_only=1 then you can toggle off the likelihood and sample from the priors alone. The same thing can be done in brms by using the sample_prior = "only" command in the brm call.

swalmsley · October 16, 2023, 1:04pm

This is a big help, thanks so much Bob!

swalmsley · October 16, 2023, 1:06pm

Neat approach, thanks!

Topic		Replies	Views
Prior predictive check multivariate model brms	7	2645	August 5, 2020
Prior predictive check with default Stan priors Modeling	6	449	June 1, 2024
Posterior predictive check for linear multilevel model: am I doing it right? Modeling techniques , specification	5	1339	September 24, 2018
Quick question about graphical prior predictive check Modeling	5	749	October 10, 2022
Multiple outcomes posterior predictions fit one outcome well but not the other Modeling	1	541	December 3, 2018

Prior predictive checks for multiple regression

Related topics