SPlines vs GProcesses

Hello!

Do you guys prefer Gaussian processes over Splines? Do you have any rule of thumbs when choosing between SPs and GPs? Please, share a bit of your experience.

2 Likes

@hhau

1 Like

As far as I understand, they’re equivalent at the limit (I.e. splines will increasingly approximate GPs as you increase the number of splines; though there is probably a connection between spline shape and GP covariance, and probably also between priors on the spline coefficients and priors on the GP parameters)

In that case, and considering the arguably easier-to-understand/easier-to-prior-set parameterization of the GP, I consider the GP as the gold-standard and use of splines necessitates justification, which most often comes from the full GP taking too long to compute. For the 1D case, GPs with covariance matrices on the order of 100x100 or thereabouts start to take a long time to sample.

2 Likes

I think this is too general a question to have a meaningful answer – both ‘splines’ and ‘GPs’ encompass such a wide variety of models and have so many applications that it’s hard to have any default-yet-always-useful option. Too much depends on the properties (and size) of the data. You may also desire certain properties of the model, like the (de)composability of the kernels in GPs which is used in the birthday problem example. The recent work on basis function approximations to GPs (see Hilbert space methods for reduced-rank Gaussian process regression | Statistics and Computing and [2004.11408] Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming) is very close in motivation to thin-plate regressions splines (https://rss.onlinelibrary.wiley.com/doi/10.1111/1467-9868.00374), which are the default for mgcv (and thus brms via s()).

My default would be the opposite of @mike-lawrence’s – I would nearly always start with a thin-plate regression spline (via mgcv / brms) for smoothing problems, and only start using a GP if

  • There was some very compelling mathematical / analytical reason for using the GP.
  • I understood the deficiencies of the spline and could specify an appropriate kernel + prior for the hyperparameters (e.g if there was clear periodicity in the data that you were trying to forecast).

I really do think this is a problem (though the basis function approximations to GPs mostly amelorates it), because ‘fitting a model’ is an iterative process. It is too easy to spend too much time early in the process waiting for a GP to fit, when you could pick a ‘worse’/less-expressive model and discover issues that would have to be addressed in any model (differing scales across interacting covariates / heteroskedasticity / importance of shape & positivity constraints).

9 Likes

For smaller-data scenarios, where a GP covariance would only be on the order of say 10x10 so compute is still reasonably fast for model-iteration, what would motivate you to still opt for splines?

I used mgcv lots when still a frequentist, then transitioned to GPs simultaneously when becoming Bayesian, so while I know splines to a decent degree I don’t know, for example, how one would go about determining prior structures for splines (thin plate don’t have knots, right?).

2 Likes

What about interpretability of the model?

For settings with 10-obs-per-individual I would be hoping to employ some kind of hierarchical structure (otherwise it would be hard to consider anything more than a straight line for each individual) – splines as ‘random effects’ / ‘individual-derivations from a population mean’ are okay for this (and there are a lot of results from the smoothing literature on how to set up the linear algebra to do this), as are GPs. I don’t know how ‘easy’ it is to fit the hierarchical version of the GP, you would know much more than I would there. For both I would value the ease with which I could fit / inspect / adapt the models above their actual form, so the tooling around them is more important (things like brms / tidybayes make this much easier for both).

Neither, but there is a penalty term which can be adjusted (many times, if the model is quick enough to fit) after inspecting the fit and deciding that it its ‘too wiggly / too smooth’, which practically the same as adjusting the prior on the GP length scale after finding the posterior fit unsatisfactory.

What do you want to interpret? Most of the value in a smoothing model comes from the fitted value (and uncertainty) at some previously unobserved point. This is mostly interpolation for splines and can, with appropriate kernels and careful modelling, be extrapolation/forecasting for GPs.

5 Likes

Correct me, if I’m wrong. We should make a decision between prediction and interpretability. Even if we use both linear effects and splines like in y = x + z + s(z), for example, the coefficient are not, do they? What would be the correct interpretation of x and z in this model?

My understanding of thin plate splines is that you’re searching for a function to fit some d-dimensional observed data where the function is smooth in the sense that all the m-th order partial derivatives are small. From that perspective the theory is very elegant as the function space you’re searching across is all functions where the m-th order partial derivatives are well defined and finite. As @hhau says there is a penalty term you can tune to vary the tradeoff between fitting the training data and being smooth.

Arguably another thing you could tune is m, the order of partial derivatives that you use in your penalty function. If you don’t have a strong reason for choosing a particular m, maybe it’d be better to consider a range of possible m, subject to 2 m > d . The order of the derivatives in the regularisation term has to be sufficiently high compared to the dimension of the domain so that the basis functions in the solution are non-singular.

I believe the nice closed-form analytic solution of the thin-plate spline basis functions depends on the assumption that your domain is R^d . For other domains, e.g. a hypercube or other bounded domains with interesting boundary conditions as you might get in engineering problems the equivalent thin-plate spline solution is well defined but it won’t be expressed in terms of the same basis functions, but can be approximated using numerical methods for solving PDE – finite element methods + some discretisation of the domain. I’m not sure i’d recommend actually doing this, due to how fiddly it would be to get working, but for certain problems maybe it’d be the right thing to do.

3 Likes

I don’t want to go too deep into this conversation but I did want to comment on a common misunderstanding about the interpretability of splines.

One of the benefits of gaussian processes is that the behaviors supported by most families of covariance functions are very interpretable not only locally (smoothness and rigidity as short length scales) but also globally (smoothness and rigidity at long length scales), which then facilitates principled prior modeling. Spline models, however, are immediately interpretable only locally and the global behaviors supported by a given spline model can be quite surprising. As an exercise try sampling functional behaviors from common spline models, see what kinds of global behaviors you get, and ask if that’s a meaningful prior model for your analysis.

Splines can be related to gaussian processes but the connection is complicated – see for example https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1467-9868.2011.00777.x – which prevents the interpretability of gaussian processes models to transition to most spline models.

6 Likes