There are some misunderstandings here.

*mgcv* doesn’t do knot selection or counting, whether you use GCV smoothness selection or REML/ML. If you don’t specify `k`

, the default of `-1`

means the basis mgcv creates will take the defaults for whatever smoother you request. For `s(x)`

, this will be a thin plate regression spline (TPRS) with `k = 10`

, but you end up with, by default, 9 basis functions `k-1`

because of the identifiably constraints that are applied. You’ll get a different number of basis functions depending on the options passed to `s()`

or `te()`

, such as `bs`

for the smoother type, `m`

which controls things like the order, etc.

In the TPRS default, Simon developed a low-rank version which preserves many of the good properties of thin plate splines without needing to have one basis function per unique data/covariate value. What Simon does is create the full basis, subject the basis to an eigen decomposition, and then selects the `k`

(or `k-1`

for identifiability constraints) first eigenvectors, such that the new basis is of the requested dimension whilst preserving as much of the full-rank information in the basis as possible.

With other bases, like the CRS (`bs = ‘cr’`

), `k`

controls the number of knots, whose locations are given by the boundaries of the covariate data and evenly in between, unless specified by the user.

What *mgcv* does do is penalize the coefficients for the basis functions via a wiggliness penalty. So you might start with 9 TPRS basis functions (given the defaults for `s()`

with a 2nd order penalty), but the smoothness parameter controls how much of the wiggliness in the basis expansion you end up using, and the EDF of the resulting smoother will typically be lower than the maximum possible (9 by default). The smoothness parameter (which controls how much the penalty applies) is what is determined by GCV or REML/ML, but note you should not use GCV by default for various reasons, including the under smoothing that it is prone to.

In summary, you do have to choose the dimensionality of the basis expansion you want to use. *mgcv* will apply identifiability constraints so you may end up with fewer than the `k=10`

basis function by default. A smoothness penalty penalizes the coefficients for the basis functions so the resulting smoother may use fewer EDFs than implied by the number of basis functions.

My point above was that setting `k = 20`

would result in the model using 19 basis functions for that smoother, whereas the code the OP showed would have resulted in only 9 basis functions being used.

I didn’t mention this earlier, as the OP was complaining about speeding up fitting and I assumed they actually meant sampling. However, setting up the TPRS basis can take a significant amount of time with large n, because the creating the all the basis functions for the full TPRS basis and then eigen decomposing it can be costly. In such situations, you lose little by setting `bs = ‘cr’`

for example in the `s()`

call.

So far I haven’t even mentioned *brms*. As *brms* uses `mgcv::smoothCon()`

to set up the basis for each `s()`

, everything I say above applies to GAMs fitted using *brms*. Even the it about smoothness parameters, which are proportional to the inverse of the variance parameter of the random effect representation of the smoother that is how *brms* fits the models. As such, *brms* fits exactly the same model as `mgcv::gam()`

, except for the priors on parameters (implied in the case of *mgcv*). *brms* does do smoothness selection because it fits a penalized spline model - the variance parameter(s) associated with a smooth are related to the smoothness parameter(s) that *mgcv* selects. *mgcv* just happens to be ridiculously fast because of the work that Simon and other have put into developing the bespoke algorithms that are implemented in *mgcv*. GAMs are complex models and they are relatively slow to fit using MCMC.

One thing you can do to help in model fitting, if *mgcv* can fit the model you want, is prototype the model using *mgcv* and work out how big you need `k`

to be using the tools *mgcv* provides. Then switch the *brms* and fit the models with the values for `k`

for each smooth that you identified as being required from the *mgcv* fits. Or, you could just think about how much wiggliness (in terms of degrees of freedom) you expect each smooth relationship to possess, and set `k`

to that number of degrees of freedom plus a bit; you want to make sure the basis expansion either contains the true function or a close approximation to it, so if you expect a ~20 df smooth relationship, you might want to set `k = 30`

to that the basis expansion hopefully contains a function in the span of the basis that either is the true function or closely approximates it.