TLDR; What are some possible heuristics to set non-flat priors for general additive models (GAMs)?

I am trying to run a somewhat complex shifted log-normal GAM, but I have problem understanding, and thus setting, appropriate priors. (This is motivated by the fact that the model doesn’t converge well, which seems to be related to the default, flat priors).

```
library(brms)
formula <- brms::brmsformula(mpg ~ s(wt), family = shifted_lognormal())
brms::get_prior(formula, data = mtcars)
#> prior class coef group resp dpar nlpar bound source
#> (flat) b default
#> (flat) b swt_1 (vectorized)
#> student_t(3, 3, 2.5) Intercept default
#> uniform(0, min_Y) ndt default
#> student_t(3, 0, 2.5) sds default
#> student_t(3, 0, 2.5) sds s(wt) (vectorized)
#> student_t(3, 0, 2.5) sigma default
```

^{Created on 2021-06-15 by the reprex package (v2.0.0)}

In the toy example above, there is one fixed parameter (`swt_1`

) which I assume is related to the smooth term (`s(wt)`

). However, I am not sure what does it mean (does it control the wigliness of the smooth term? or some other features of the splines?)

In particular, how can set somewhat better default priors without running the model with flat priors first, i.e., based for instance on the range / SD of the response or predictors?

For instance, running the model with flat priors gives me a negative coefficient for the `swt_1`

parameter, which I’m not sure how to interpret…

```
summary(brms::brm(formula, data = mtcars, refresh = 0, iter = 500))
```

```
Warning: 1 of 1000 (0.0%) transitions ended with a divergence.
This may indicate insufficient exploration of the posterior distribution.
Possible remedies include:
* Increasing adapt_delta closer to 1 (default is 0.8)
* Reparameterizing the model (e.g. using a non-centered parameterization)
* Using informative or weakly informative prior distributions
Family: shifted_lognormal
Links: mu = identity; sigma = identity; ndt = identity
Formula: mpg ~ s(wt)
Data: mtcars (Number of observations: 32)
Samples: 4 chains, each with iter = 500; warmup = 250; thin = 1;
total post-warmup samples = 1000
Smooth Terms:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sds(swt_1) 0.31 0.33 0.01 1.23 1.01 333 428
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 2.66 0.17 2.35 2.94 1.01 527 496
swt_1 -1.87 0.78 -3.33 -0.21 1.00 426 299
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.20 0.05 0.13 0.30 1.01 544 676
ndt 4.44 2.14 0.39 7.99 1.01 529 547
Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
```

I can set a prior based on this model:

```
priors <- c(set_prior('normal(-1.87, 0.78)', class = 'b', coef = "swt_1")) %>%
brms::validate_prior(formula, data = mtcars)
```

but I would like a way to find reasonable priors “from the data”.

Any thoughts and suggestions are welcome!