Creating matrix of grouped smooths as in brms

I’m trying to understand how brms uses the smooth/spline functions from mgcv to create the matrix of smooths used in model fitting.

In trying to construct the matrix myself, I can’t seem to figure out how things are happening under the hood in brms.

An example:

library(brms)
library(mgcv)

# Make some data
data <- data.frame(Time = rep(1:100, 4),
                   Group = rep(factor(LETTERS[1:4]), each = 100),
                   Y = rnorm(400, 5, 2))

# Generate Stan data from brms
Stan_Data <- make_standata(Y ~ s(Time, bs = "bs", k = 20, by = Group), data = data)

# Look at resulting Xs smooth matrix
Stan_Data[["Xs"]]

# Calculate smooths from mgcv directly
sm <- smoothCon(s(Time, bs = "bs", k = 20, by = Group), 
                data)

# Look at those results
view(sm)

brms appears to be generating 18 values for each group (Zs_Group_1), whereas mgcv is giving 20.

Does anyone know where the Z_Group_1 and the Xs matrix are coming from and how I would go about calculating those from the mgvc output?

I’ve tried looking through the brms source code, but wasn’t able to figure this out.

I think I’m getting a little closer as I am at least seeing the same columns, although in a different order and there are more in the mgcv prediction matrix.

# Some data
data2 <- data.frame(Time = rep(1:100),
                   Y = rnorm(100, 5, 2))

# Generate Stan data from brms
Stan_Data2 <- make_standata(Y ~ s(Time, bs = "bs", k = 20), data = data2)

# Make a smooth object from mgcv
sm2 <- smoothCon(s(Time, bs = "bs", k = 20), 
                data2, absorb.cons = TRUE, modCon = 3, diagonal.penalty = TRUE)[[1]]

# Create a prediction matrix from mgcv
pm <- PredictMat(sm2, 
                 data.frame(Time = data2$Time))

pm has 19 columns, Zs_1_1 in Stan_Data2 has 18 columns in a different order from pm, but with the same values.

@paul.buerkner , can you offer any insight?

Check out data_sm in brms/data-predictor.R at master · paul-buerkner/brms · GitHub

Thanks!

I noticed this section:

# data preparation for splines
data_sm <- function(bterms, data, basis = NULL) {
  out <- list()
  smterms <- all_terms(bterms[["sm"]])
  if (!length(smterms)) {
    return(out)
  }
  p <- usc(combine_prefix(bterms))
  new <- length(basis) > 0L
  if (!new) {
    knots <- get_knots(data)
    basis <- named_list(smterms)
    for (i in seq_along(smterms)) {
      # the spline penalty has changed in 2.8.7 (#646)
      diagonal.penalty <- !require_old_default("2.8.7")
      basis[[i]] <- smoothCon(
        eval2(smterms[i]), data = data, 
        knots = knots, absorb.cons = TRUE,
        diagonal.penalty = diagonal.penalty
      )

I added the additional arguments to the mgcv smoothCon call, but still get the discrepancy in the number and order of coefficients.

My goal is to build the same Zs_1_1 and Xs matrices “manually” through mgcv, but I’m not sure where the shuffling is occuring

To illustrate:

This is the result of the PredictMat:

           [,1]         [,2]        [,3]      [,4]      [,5]      [,6]       [,7]
[1,] 0.01552785  0.012754614  0.10415721 0.3574953 0.5955881 0.6067269 -0.2811756
[2,] 0.02369922 -0.004846921  0.07828497 0.3386568 0.6050628 0.6606941 -0.3989015
[3,] 0.03263690 -0.024967703  0.04688324 0.3102596 0.6027531 0.7044077 -0.5102859
[4,] 0.04195323 -0.046801346  0.01111433 0.2736061 0.5897278 0.7384236 -0.6154584
[5,] 0.05126051 -0.069541460 -0.02785943 0.2299988 0.5670560 0.7632978 -0.7145485
[6,] 0.06017108 -0.092381658 -0.06887572 0.1807400 0.5358066 0.7795864 -0.8076860
         [,8]      [,9]
[1,] 4.025937 -1.479958
[2,] 3.807920 -1.450060
[3,] 3.592611 -1.420162
[4,] 3.380031 -1.390264
[5,] 3.170200 -1.360366
[6,] 2.963140 -1.330468

and this is Zs_1_1 from the Stan data generated by brms:

           [,1]     [,2]       [,3]      [,4]      [,5]      [,6]        [,7]
[1,] 0.01552785 4.025937 -0.2811756 0.6067269 0.5955881 0.3574953  0.10415721
[2,] 0.02369922 3.807920 -0.3989015 0.6606941 0.6050628 0.3386568  0.07828497
[3,] 0.03263690 3.592611 -0.5102859 0.7044077 0.6027531 0.3102596  0.04688324
[4,] 0.04195323 3.380031 -0.6154584 0.7384236 0.5897278 0.2736061  0.01111433
[5,] 0.05126051 3.170200 -0.7145485 0.7632978 0.5670560 0.2299988 -0.02785943
[6,] 0.06017108 2.963140 -0.8076860 0.7795864 0.5358066 0.1807400 -0.06887572
             [,8]
[1,]  0.012754614
[2,] -0.004846921
[3,] -0.024967703
[4,] -0.046801346
[5,] -0.069541460
[6,] -0.092381658

The last column in the PredictMat call isn’t in the brms data, which is easy enough to omit. But, the other columns are in a different order

After digging deeper into brms/data-predictor.R at master · paul-buerkner/brms · GitHub ,

I found that Zs_1_1 is mgcv::smooth2random(sm, names(data), type = 2)[["rand"]][["Xr"]]
Where sm = smoothCon(s(Time, bs = "bs", k = 20, by = Group), data, absorb.cons = TRUE, modCon = 3, diagonal.penalty = TRUE)[[1]]