I’ve spent a good amount of time over the past several months reading and thinking about Gaussian processes. I’m now starting to implement them in some work them I’m doing, modeling time-series of various vital rates (e.g. number of young, small individuals, “recruits”) across multiple populations. However, I’m a bit unclear about how to best work with the brms output that includes a GP. The documentation for
brms::gp is… well, sparse and doesn’t offer much more than
conditional_effects() |> plot(), which leaves me wanting more. Rummaging around on this forum hasn’t really developed any compelling leads for handling Gaussian processes fit through brms in terms of conducting model criticism and evaluation, or generally just visualizing the prior/posterior predictions. Does anyone have any handy references and/or examples for working with brms GP models? The closest I’ve come is this section of the brms translation of statistical rethinking, but the use-case given there is a bit different and the visualizations not really what I’m hoping to create.
Plugging my GP model into my usual model evaluation framework yields some ugly results – to my eye, I’d say that
tidybayes are not interpolating predictions (or expected values of the prediction) at times that do not occur in the data set. As a starting point, I guess I’d like to produce plots that are roughly analogous to @betanalpha 's Robust Gaussian Process tutotial: spaghetti plots and ribbon plots, in some cases where the GP is not one of the plot axes. There seem to be more GP examples operating in base Stan than in brms, so this feels like an area where more resources are needed. I assume all of this is possible outside of conditional effects, but not sure what the best/easiest/simplest workflow might be!
Thanks for any pointers the community can provide!
- Operating System: macOS 13.4
- brms Version: 2.19.0
newdata with dense sampling along the GP dimension, in conjunction with the tools you’ve already tried, yield the results you’re looking for?
To test things out, I’ve fit a simple model with a population-level effect that basically allows each group in my dataset to have a unique intercept with a GP for each of those groups. To examine the output, my initial testing was not very dense (50 points along the GP dimension). For example, here’s an animated hypothetical outcomes plot of 50 draws from the prior distribution, plotted along the GP axis.
It looks like each of the lines spikes to the GP deviation at each of the 50 time points, but then returns to the group mean in between. I’m not sure why/how these interpolated time_points are appearing in the
newdata, so I have some sleuthing to-do, but I haven’t had issues with non-GP models and this same code.
Increasing the density of points by ~10x to 550 (comparable to the Robust Gaussian Process modeling tutorial), still looks off.
The time points estimated by the GP are much denser, producing something like a ribbon that reflects the GP deviation from the mean of each population. This is not the “scrolling” spaghetti plot that I’d expect either…
Oh, wait, I’m a dummy. There was an error in the code generating
newdata. Factor variables were getting crossed when they should not have been and un-crossing them leads the output to match my expectations.
Nothing to see here. Marked @jsocolar’s response as a solution, but I’d still be happy to see any references/resources that people may know or like for developing a workflow for using GPs in Stan/brms. However, I’ll continue to muddle through now that I’ve sorted the above issues out.
For additive GP time series models, lgpr package (built on top of Stan) can make things much easier Longitudinal Gaussian Process Regression • lgpr
For some other plotting examples see Gaussian process demonstration with Stan and Bayesian workflow book - Birthdays
Thanks, @avehtari! Those are some good resources for thinking about how to structure my workflow and summarize results… The
lgpr package looks pretty nifty!