Splines and Model Complexity

I have been playing around with splines in BRMS and I have a little problem I can’t seem to figure out.
As I increase the number of knots in the spline I would expect the model complexity to go up long with a better fit. The marginal likelihood is supposed to be a summary of both fit and model complexity. The bayes factor or the post probability is supposed to give you the best model considering complexity and model fit.

The problem is that the marginal likelihood always goes up with increased knots no matter what. The minimal example below generates a simple linear relation then the summary I am talking about.

x <- seq(0,10,length.out = 100)
y <- 2*x+rnorm(100)
df <- data.frame(x,y)


b1 <- brm(y~x,data=df,prior=prior(normal(2,1),class=b),
bs <- brm(y~s(x),data=df,prior=prior(normal(2,1),class=b),
bs20 <- brm(y~s(x,k=20),data=df,prior=prior(normal(2,1),class=b),

bs30 <- brm(y~s(x,k=30),data=df,prior=prior(normal(2,1),class=b),

fits = list(b1,bs,bs20,bs30)
ml = sapply(fits,FUN=function(x) bridge_sampler(x,silent=TRUE)$logml)
df2 <- data.frame(Log_Marginal_Like=ml-max(ml))
row.names(df2) <- c('Linear','Spline 8 Knots','Spline 20 Knots','Spline 30 Knots')


With the result being

Model || Log_Marginal_Like
Linear || 0.000000
Spline 8 Knots || -12.323401
Spline 20 Knots || -10.341746
Spline 30 Knots || -9.987082

Where I subtracted off the fit with the highest log marginal probability. Wouldn’t you think that the order should be Linear, Spline 8 knots, 20 knots, then 30 knots, as the added knots shouldn’t help the fit?

Thank you for your help.

All the splines in brms are implemented through mgcv. You should definitely read up on the different types available in that package and the associated parameters. For example, I don’t believe k refers to knots when you’re using the default thin plate variety you get when you don’t specify the bs term explicitly.

You’ll also want to make liberal use of conditional_smooths and PPCs to see what makes sense for your data.

1 Like