# Case study on splines

We just published a case study on splines.

Splines in Stan

http://mc-stan.org/users/documentation/case-studies/splines_in_stan.html

Abstract
In this document, we discuss the implementation of splines in Stan. We start by providing a brief introduction to splines and then explain how they can be implemented in Stan. We also discuss a novel prior that alleviates some of the practical challenges of spline models.

6 Likes

B <- t(bs(X, knots=seq(-5,5,1), degree=3, intercept = TRUE)) # creating the B-splines
num_data <- length(X); num_basis <- nrow(B)
a0 <- 0.2 # intercept
a <- rnorm(num_basis, 0, 1) # coefficients of B-splines
Y_true <- as.vector(a0X + a%%B) # generating the output

Iâ€™m confused, why we need two intercepts: a0 and the one in calling bs?

You can also specify whether you want an intercept term. Normally this isnâ€™t specified since ns is most often used in conjunction with lm, which includes an intercept implicitly (unless forced not to). If you use intercept=TRUE in your call to ns, make sure you know why youâ€™re doing so, since if you do this and then call lm naively, the design matrix will end up being rank deficient.

I donâ€™t know why, but I donâ€™t see a call to `lm` anywhere. Is the design matrix being produced indeed rank deficient? You can deal with that with priors, but usually you donâ€™t want it.

It doesnâ€™t even look like well-formed R with `a0 X`.

Iâ€™ll email Milad, who didnâ€™t include his email in the case study and maybe heâ€™ll know.

The call to lm is the Stan program. The bs call is just a way to create a design-matrix which can be used by lm or in a Stan program, etc. The design-matrix produced by bs is not deficient. If one adds additional rows/columns to it, it may become deficient. Another subtle problem is, that the prior of the intercept in the design-matrix, may not be normal(0, tau).
It could be normal(0, 10) or normal(theta, tau) with theta ~ normal(0, 10).

Lots of thanks to Milad and the community for this very helpful post. I am working on implementing P-splines in RStan, P-splines as proposed by Eilers and Marx (1996) that penalize second-order differences â€” same as `mgcv::gam(log(Y) ~ s(X, bs = â€śpsâ€ť)`. This seems to be a very common application but this post is one of the most related materials I found. I am new to the Bayesian framework and would appreciate any advice on how to do such p-spline in RStan. (Using `rstanarm::stan_gamm4`? It might work, but I need to figure out how to set up in Stan.) Lots of thanks in advance.