Functional data analysis (e.g. scalar-on-function regression, as well as functional response types) can be implemented in Stan quite easily - and indeed automated - by using the brms
package, which itself calls mgcv
to prepare the various matrices. mgcv
it turns out can be prodded to do various types of functional data analysis (see Wood (2017) Generalized Additive Models, 2nd ed., p. 391).
Wood outlines a scalar-on-function regression using mgcv
illustrated with data available in a CRAN package associated with this text gamair
. To run this in Stan with brms
simply requires replicating the formula that would be passed to the mgcv::gam()
function.
library(gamair)
data(gas)
library(brms)
library(mgcv)
mStan <- brm(octane ~ s(nm, by=NIR, k=50), data=gas)
mMgcv <- gam(octane ~ s(nm, by=NIR, k=50), data=gas)
The Stan code can be retrieved as:
brms::make_stancode(octane ~ s(nm, by=NIR, k=50), data=gas)
Interestingly, in this example NIR
is a matrix (i.e. where each row represents a different curve over the nm
variable, and octane
is the scalar response).
The Stan code prepared by brms
produces a very similar fit to mgcv
though considerably slower. More of concern, however, is that nearly every post-warmup sample exceeds max_treedepth
. Increasing this to 15 has only a small impact.
How concerned should I be about this, considering other diagnostics suggest a good fit (e.g. bayes_R2 = 0.97
; fits resemble closely those produced by mgcv
, Rhats are near one, etc.)
Fruitful discussions with Paul Buerkner (e.g. see this thread on the brms
issue tracker) suggest that brms
is unlikely to be able to resolve this issue and model customizations will therefore be essential should the reduction in these warnings be essential or even possible.