 Case study on splines

We just published a case study on splines.

Splines in Stan

http://mc-stan.org/users/documentation/case-studies/splines_in_stan.html

Abstract
In this document, we discuss the implementation of splines in Stan. We start by providing a brief introduction to splines and then explain how they can be implemented in Stan. We also discuss a novel prior that alleviates some of the practical challenges of spline models.

6 Likes

B <- t(bs(X, knots=seq(-5,5,1), degree=3, intercept = TRUE)) # creating the B-splines
num_data <- length(X); num_basis <- nrow(B)
a0 <- 0.2 # intercept
a <- rnorm(num_basis, 0, 1) # coefficients of B-splines
Y_true <- as.vector(a0X + a%%B) # generating the output

I’m confused, why we need two intercepts: a0 and the one in calling bs?

You can also specify whether you want an intercept term. Normally this isn’t specified since ns is most often used in conjunction with lm, which includes an intercept implicitly (unless forced not to). If you use intercept=TRUE in your call to ns, make sure you know why you’re doing so, since if you do this and then call lm naively, the design matrix will end up being rank deficient.

I don’t know why, but I don’t see a call to `lm` anywhere. Is the design matrix being produced indeed rank deficient? You can deal with that with priors, but usually you don’t want it.

It doesn’t even look like well-formed R with `a0 X`.

I’ll email Milad, who didn’t include his email in the case study and maybe he’ll know.

The call to lm is the Stan program. The bs call is just a way to create a design-matrix which can be used by lm or in a Stan program, etc. The design-matrix produced by bs is not deficient. If one adds additional rows/columns to it, it may become deficient. Another subtle problem is, that the prior of the intercept in the design-matrix, may not be normal(0, tau).
It could be normal(0, 10) or normal(theta, tau) with theta ~ normal(0, 10).

Lots of thanks to Milad and the community for this very helpful post. I am working on implementing P-splines in RStan, P-splines as proposed by Eilers and Marx (1996) that penalize second-order differences — same as `mgcv::gam(log(Y) ~ s(X, bs = “ps”)`. This seems to be a very common application but this post is one of the most related materials I found. I am new to the Bayesian framework and would appreciate any advice on how to do such p-spline in RStan. (Using `rstanarm::stan_gamm4`? It might work, but I need to figure out how to set up in Stan.) Lots of thanks in advance.