# Decoding smooth fits in brms

When using a thin-plate regression spline for a single predictor in `brms`, e.g. with a formula such as `y ~ x1 + s(x2)`, how does one estimate the effective number of degrees of freedom for `x2` and how does one construct a prediction equation for the `s(x2)` part of the model? For “the” equation I’m thinking in the sense of a posterior median or mean regression prediction, i.e., estimated transformation of `x2`.

I know how to plot the estimated transformation, but seeing an equation representing it would add additional insight and would enable certain simple non-R prediction computations.

I see this answer which is not very encouraging about the degrees of freedom part. I wonder if anything has changed.

The best I’ve got for the degrees of freedom issue is to use information criteria. The `waic()` function returns a `p_waic` estimate, and the `loo()` function returns a `p_loo` estimate. I believe both can be interpreted as the effective number of parameters, somewhat analogous to a degree of freedom.

Some of us have experimented with notation for models with smooth terms in this thread, but I’m not so sure that’s what you’re looking for.

2 Likes

On the effective degrees of freedom, I recently came away with the same discouragement. I’d also love to know if that’s possible in `brms` as edf is a handy statistic for smooths in `mgcv`.

On the equation / prediction outside of R:
Simon Wood responds to that same question for `mgcv` here (see item 2). I don’t have his book handy atm for the actual equation bit, though it seems tedious. However, Prof. Wood does outline a procedure for potential non-R predictions:

``````library(mgcv)
# see ?predict.gam
n <- 200
sig <- 2
dat <- gamSim(1,n=n,scale=sig)
newd <- data.frame(x0=(0:30)/30,x1=(0:30)/30,x2=(0:30)/30,x3=(0:30)/30)

b <- gam(y~s(x0)+s(I(x1^2))+s(x2)+offset(x3),data=dat)
Xp <- predict(b,newd,type="lpmatrix")  # linear predictor matrix

xn <- c(.341,.122,.476,.981) ## want prediction at these values
x0 <- 1         ## intercept column
dx <- 1/30      ## covariate spacing in `newd'
for (j in 0:2) { ## loop through smooth terms
cols <- 1+j*9 +1:9      ## relevant cols of Xp
i <- floor(xn[j+1]*30)  ## find relevant rows of Xp
w1 <- (xn[j+1]-i*dx)/dx ## interpolation weights
## find approx. predict matrix row portion, by interpolation
x0 <- c(x0,Xp[i+2,cols]*w1 + Xp[i+1,cols]*(1-w1))
}
dim(x0)<-c(1,28)
fv <- x0%*%coef(b) + xn[4];fv    ## evaluate and add offset
se <- sqrt(x0%*%b\$Vp%*%t(x0));se ## get standard error
## compare to normal prediction
predict(b,newdata=data.frame(x0=xn[1],x1=xn[2],
x2=xn[3],x3=xn[4]),se=TRUE)
``````

Perhaps a similar process can be used for GAMs estimated in `brms`?

1 Like

Simon Wood’s wonderful book actually arrived in yesterday’s mail :-)

This is exceptionally helpful Zach. The take-home message from Wood is that you use linear interpretation to evaluate the function at any points you desire. That is something we already get from the conditional plot whose points we could capture if needed. There doesn’t seem to be a way to get an actual equation. As an aside I’m not clear how `brms` actually uses Stan to deal with something that’s not an equation.

I’ll probably stick with ordinary regression splines (nature splines, aka restricted cubic splines) even though I have to pre-specify the number of knots. A major reason for this more parametric approach is that I can specify priors on any feature of the fit, e.g., a prior on the inter-quartile-range odds ratio or a prior on the amount of linearity of the function on an interval [a,b] as exemplified here.

1 Like

Simon Wood offers several competing methods for calculating EDF from Bayesian splines, though none of them incorporates uncertainty in the smoothing parameters. You can see how these are done in the family of `jagam` functions within `mgcv`, starting from line 306 here: mgcv/R/jagam.r at master · cran/mgcv · GitHub.

In short, if we can calculate `w`, which requires the `variance` and `mu.eta` functions from the relevant observation family, and if we have the relevant smoothing parameters and their penalty matrices, we can calculate EDF. I’ve been able to make use of this in my package `mvgam` because I use the same multivariate normal parameterisation for smooths that `mgcv` uses, so I can quickly pull out the relevant penalty matrices (see the underlying code here: mvgam/R/compute_edf.R at master · nicholasjclark/mvgam · GitHub). But to do so in `brms` might be harder, though perhaps not impossible

2 Likes