Better priors (non-flat) for GAMs (brms)

TLDR; What are some possible heuristics to set non-flat priors for general additive models (GAMs)?

I am trying to run a somewhat complex shifted log-normal GAM, but I have problem understanding, and thus setting, appropriate priors. (This is motivated by the fact that the model doesn’t converge well, which seems to be related to the default, flat priors).

library(brms)

formula <- brms::brmsformula(mpg ~ s(wt), family = shifted_lognormal())

brms::get_prior(formula, data = mtcars)
#>                 prior     class  coef group resp dpar nlpar bound       source
#>                (flat)         b                                        default
#>                (flat)         b swt_1                             (vectorized)
#>  student_t(3, 3, 2.5) Intercept                                        default
#>     uniform(0, min_Y)       ndt                                        default
#>  student_t(3, 0, 2.5)       sds                                        default
#>  student_t(3, 0, 2.5)       sds s(wt)                             (vectorized)
#>  student_t(3, 0, 2.5)     sigma                                        default

Created on 2021-06-15 by the reprex package (v2.0.0)

In the toy example above, there is one fixed parameter (swt_1) which I assume is related to the smooth term (s(wt)). However, I am not sure what does it mean (does it control the wigliness of the smooth term? or some other features of the splines?)

In particular, how can set somewhat better default priors without running the model with flat priors first, i.e., based for instance on the range / SD of the response or predictors?

For instance, running the model with flat priors gives me a negative coefficient for the swt_1 parameter, which I’m not sure how to interpret…

summary(brms::brm(formula, data = mtcars, refresh = 0, iter = 500))
Warning: 1 of 1000 (0.0%) transitions ended with a divergence.
This may indicate insufficient exploration of the posterior distribution.
Possible remedies include: 
  * Increasing adapt_delta closer to 1 (default is 0.8) 
  * Reparameterizing the model (e.g. using a non-centered parameterization)
  * Using informative or weakly informative prior distributions 

 Family: shifted_lognormal 
  Links: mu = identity; sigma = identity; ndt = identity 
Formula: mpg ~ s(wt) 
   Data: mtcars (Number of observations: 32) 
Samples: 4 chains, each with iter = 500; warmup = 250; thin = 1;
         total post-warmup samples = 1000

Smooth Terms: 
           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sds(swt_1)     0.31      0.33     0.01     1.23 1.01      333      428

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     2.66      0.17     2.35     2.94 1.01      527      496
swt_1        -1.87      0.78    -3.33    -0.21 1.00      426      299

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.20      0.05     0.13     0.30 1.01      544      676
ndt       4.44      2.14     0.39     7.99 1.01      529      547

Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

I can set a prior based on this model:

priors <- c(set_prior('normal(-1.87, 0.78)', class = 'b', coef = "swt_1")) %>%
  brms::validate_prior(formula, data = mtcars)

but I would like a way to find reasonable priors “from the data”.

Any thoughts and suggestions are welcome!

1 Like

Hi, that’s a good question!

I’ll start with:

that’s usually a bad idea. Priors should be derived from domain expertise or other considerations that you can make without taking the actual data into account e.g. experiment design, data from a pilot/prior experiment that you don’t use in this new model, etc. Deriving priors from data runs the risk of being onverconfident in your inferences. Although we often make some decisions about priors only after we collected that data (e.g. because the model does not fit), one should always try to check if the prior would be defensible without seeing the data.

However, you almost always have at least some prior information. E.g. in the shifted lognormal model, all the coefficents correspond to logarithm of multiplicative changes to the mean. In many cases, assuming that groups or (min/max values of a continuous predictors) differe less than say 20-fold are reasonable, so having max_predictor_difference * coefficent < log(20) is a good rule of thumb, which can be well represented by something like normal(0, 1.5 / max_predictor_difference) as log(20) / 2 ~= 1.5. That’s already quite narrow prior and we only needed to know the predictors (i.e. design), not the data!

Some guidance (although some of it a bit outdated) can be found at Prior Choice Recommendations · stan-dev/stan Wiki · GitHub and the most general way to set priors is to use prior predictive checks (discussed e.g. in the Bayesian workflow preprint or visualisation preprint). You can often run prior predictive checks in brms by setting prior_only = TRUE, but it is often instructive to try to write your own simulation code in R to verify you understand what brms is actually doing :-).

To get to your specific case - brms borrows the parametrization of smooth terms from mgcv. There is an IMHO good explanation on how that wors at Random effects and penalized splines are the same thing - Higher Order Functions but notably, the b_swt_1 parameter is basically the linear trend of wt, although wt values are somehow centered and rescaled - I don’t understand exactly how, but you can look at the result of

dd <- make_standata(brms::brmsformula(mpg ~ s(wt), family = shifted_lognormal()), data = mtcars)
dd$Xs
plot(dd$Xs, mtcars$wt)

to see it is just a linear transformation, the resulting plot is:

Best of luck in setting your priors!

3 Likes

Hi Martin,

Thanks a lot for your detailed answer - it’s very helpful.

These practical rules of thumb to avoid flat priors are imo very useful (and somewhat hard to find advice on).

That’s indeed a very important piece of information!

It should be now easier for me to understand and interpret the parameters of my models. Thanks again :)

1 Like