TLDR; What are some possible heuristics to set non-flat priors for general additive models (GAMs)?
I am trying to run a somewhat complex shifted log-normal GAM, but I have problem understanding, and thus setting, appropriate priors. (This is motivated by the fact that the model doesn’t converge well, which seems to be related to the default, flat priors).
library(brms) formula <- brms::brmsformula(mpg ~ s(wt), family = shifted_lognormal()) brms::get_prior(formula, data = mtcars) #> prior class coef group resp dpar nlpar bound source #> (flat) b default #> (flat) b swt_1 (vectorized) #> student_t(3, 3, 2.5) Intercept default #> uniform(0, min_Y) ndt default #> student_t(3, 0, 2.5) sds default #> student_t(3, 0, 2.5) sds s(wt) (vectorized) #> student_t(3, 0, 2.5) sigma default
Created on 2021-06-15 by the reprex package (v2.0.0)
In the toy example above, there is one fixed parameter (
swt_1) which I assume is related to the smooth term (
s(wt)). However, I am not sure what does it mean (does it control the wigliness of the smooth term? or some other features of the splines?)
In particular, how can set somewhat better default priors without running the model with flat priors first, i.e., based for instance on the range / SD of the response or predictors?
For instance, running the model with flat priors gives me a negative coefficient for the
swt_1 parameter, which I’m not sure how to interpret…
summary(brms::brm(formula, data = mtcars, refresh = 0, iter = 500))
Warning: 1 of 1000 (0.0%) transitions ended with a divergence. This may indicate insufficient exploration of the posterior distribution. Possible remedies include: * Increasing adapt_delta closer to 1 (default is 0.8) * Reparameterizing the model (e.g. using a non-centered parameterization) * Using informative or weakly informative prior distributions Family: shifted_lognormal Links: mu = identity; sigma = identity; ndt = identity Formula: mpg ~ s(wt) Data: mtcars (Number of observations: 32) Samples: 4 chains, each with iter = 500; warmup = 250; thin = 1; total post-warmup samples = 1000 Smooth Terms: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS sds(swt_1) 0.31 0.33 0.01 1.23 1.01 333 428 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS Intercept 2.66 0.17 2.35 2.94 1.01 527 496 swt_1 -1.87 0.78 -3.33 -0.21 1.00 426 299 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS sigma 0.20 0.05 0.13 0.30 1.01 544 676 ndt 4.44 2.14 0.39 7.99 1.01 529 547 Samples were drawn using sample(hmc). For each parameter, Bulk_ESS and Tail_ESS are effective sample size measures, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).
I can set a prior based on this model:
priors <- c(set_prior('normal(-1.87, 0.78)', class = 'b', coef = "swt_1")) %>% brms::validate_prior(formula, data = mtcars)
but I would like a way to find reasonable priors “from the data”.
Any thoughts and suggestions are welcome!