Hi, that’s a good question!
I’ll start with:
that’s usually a bad idea. Priors should be derived from domain expertise or other considerations that you can make without taking the actual data into account e.g. experiment design, data from a pilot/prior experiment that you don’t use in this new model, etc. Deriving priors from data runs the risk of being onverconfident in your inferences. Although we often make some decisions about priors only after we collected that data (e.g. because the model does not fit), one should always try to check if the prior would be defensible without seeing the data.
However, you almost always have at least some prior information. E.g. in the shifted lognormal model, all the coefficents correspond to logarithm of multiplicative changes to the mean. In many cases, assuming that groups or (min/max values of a continuous predictors) differe less than say 20-fold are reasonable, so having max_predictor_difference * coefficent < log(20)
is a good rule of thumb, which can be well represented by something like normal(0, 1.5 / max_predictor_difference)
as log(20) / 2 ~= 1.5
. That’s already quite narrow prior and we only needed to know the predictors (i.e. design), not the data!
Some guidance (although some of it a bit outdated) can be found at Prior Choice Recommendations · stan-dev/stan Wiki · GitHub and the most general way to set priors is to use prior predictive checks (discussed e.g. in the Bayesian workflow preprint or visualisation preprint). You can often run prior predictive checks in brms by setting prior_only = TRUE
, but it is often instructive to try to write your own simulation code in R to verify you understand what brms
is actually doing :-).
To get to your specific case - brms
borrows the parametrization of smooth terms from mgcv
. There is an IMHO good explanation on how that wors at Random effects and penalized splines are the same thing - Higher Order Functions but notably, the b_swt_1
parameter is basically the linear trend of wt
, although wt
values are somehow centered and rescaled - I don’t understand exactly how, but you can look at the result of
dd <- make_standata(brms::brmsformula(mpg ~ s(wt), family = shifted_lognormal()), data = mtcars)
dd$Xs
plot(dd$Xs, mtcars$wt)
to see it is just a linear transformation, the resulting plot is:
Best of luck in setting your priors!