Prior predictive checks for multiple regression

Apologies in advance if this is the wrong place to be posting this! I’m trying to run some prior predictive checks for the linear predictor portion of a basic multiple regression. With simple regression (one predictor) of the following form:

Y \sim N(\mu, \sigma) \\mu= \alpha + \beta * X

I understand how to sample values of alpha and beta from the priors to fit various values of mu as a function of a sensible range of X values.

I’m unsure of how this would work with a multiple regression of the form:

Y \sim N(\mu, \sigma) \\ mu=\alpha + \beta_1 * X1+ \beta_2 * X2

Would it make sense to chunk out values of (alpha and beta_1) and (alpha and beta_2) to create two plots of possible lines against possible X values, one for each predictor? Or would this no longer make sense given the implied interdependencies between all three coefficients, in which case I should be examining the distribution of computed mu values or some other quantity?

Thank you!

Thanks for posting. This is the right forum, though I moved it to category Modeling. Sorry nobody’s responded yet, as this one’s easy to answer.

Prior predictive checks are the same in the single regression and multiple regression case. For a regression, we have

Y_n \sim \textrm{normal}(\alpha + x_n \cdot \beta, \sigma),

where the outcome Y_n \in \mathbb{R} is treated as a random variable and x_n \in \mathbb{R}^K is a row vector treated as a constant and \beta \in \mathbb{R}^K is a vector of coefficients, \alpha \in \mathbb{R} is the intercept, and \sigma \in (0, \infty) is the error scale. Then we also need a prior

\alpha, \beta, \sigma \sim p(\alpha, \beta, \sigma).

To do a prior predictive check, we generate M data sets, one for each m \in 1:M. To generate data set m, simulate the parameters from the prior

\alpha^{(m)}, \beta^{(m)} , \sigma^{(m)} \sim p(\alpha, \beta, \sigma)

and then simulate data from the parameters

y_n^{(m)} \sim \textrm{normal}(\alpha^{(m)} + x_n \cdot \beta^{(m)}, \sigma^{(m)}) for n \in 1:N

Now you have a simulated data set y^{(m)} for each m \in 1:M. Typically we look at statistics of the y^{(m)} and compare to our actual data y.

So when you do prior predictive checks, the x_n do not change. So it doesn’t matter what the dimensionality is.

There will be correlation among the parameters in the posterior. You haven’t said what your prior is, but often priors are independent in the parameters.

4 Likes

Sorry, I actually read this several days ago and then forgot to reply! @Bob_Carpenter answer is excellent, and the practical implementation of it is easy in Stan or brms. In Stan, you can write your full Stan code (with predictions generated in the generated quantities block) and then wrap the likelihood statement in something like if(prior_only==0) { ....... } with prior_only passed as data to Stan and defined in the data block. When you pass prior_only=1 then you can toggle off the likelihood and sample from the priors alone. The same thing can be done in brms by using the sample_prior = "only" command in the brm call.

2 Likes

This is a big help, thanks so much Bob!

Neat approach, thanks!