Case study on splines

We just published a case study on splines.

Splines in Stan

Milad Kharratzadeh

In this document, we discuss the implementation of splines in Stan. We start by providing a brief introduction to splines and then explain how they can be implemented in Stan. We also discuss a novel prior that alleviates some of the practical challenges of spline models.


B <- t(bs(X, knots=seq(-5,5,1), degree=3, intercept = TRUE)) # creating the B-splines
num_data <- length(X); num_basis <- nrow(B)
a0 <- 0.2 # intercept
a <- rnorm(num_basis, 0, 1) # coefficients of B-splines
Y_true <- as.vector(a0X + a%%B) # generating the output

I’m confused, why we need two intercepts: a0 and the one in calling bs?


You can also specify whether you want an intercept term. Normally this isn’t specified since ns is most often used in conjunction with lm, which includes an intercept implicitly (unless forced not to). If you use intercept=TRUE in your call to ns, make sure you know why you’re doing so, since if you do this and then call lm naively, the design matrix will end up being rank deficient.

I don’t know why, but I don’t see a call to lm anywhere. Is the design matrix being produced indeed rank deficient? You can deal with that with priors, but usually you don’t want it.

It doesn’t even look like well-formed R with a0 X.

I’ll email Milad, who didn’t include his email in the case study and maybe he’ll know.

The call to lm is the Stan program. The bs call is just a way to create a design-matrix which can be used by lm or in a Stan program, etc. The design-matrix produced by bs is not deficient. If one adds additional rows/columns to it, it may become deficient. Another subtle problem is, that the prior of the intercept in the design-matrix, may not be normal(0, tau).
It could be normal(0, 10) or normal(theta, tau) with theta ~ normal(0, 10).

Thanks for the reply,

Nice worked example to follow @Bob_Carpenter . I was wondering however can you (or Milad) share the code to plot the fitted graphs? Its not clear to me how generate the fitted curve from the posteriors. Thanks.

Can’t you just extract the Y_hat values from the stanfit model, calculate the mean across samples, and plot these as the predicted values?

1 Like

Ehh yes, turns out thats it 🤦‍♂️
Thanks … clearly I need to take a break!

Lots of thanks to Milad and the community for this very helpful post. I am working on implementing P-splines in RStan, P-splines as proposed by Eilers and Marx (1996) that penalize second-order differences — same as mgcv::gam(log(Y) ~ s(X, bs = “ps”). This seems to be a very common application but this post is one of the most related materials I found. I am new to the Bayesian framework and would appreciate any advice on how to do such p-spline in RStan. (Using rstanarm::stan_gamm4? It might work, but I need to figure out how to set up in Stan.) Lots of thanks in advance.