# Creating matrix of grouped smooths as in brms

I’m trying to understand how `brms` uses the smooth/spline functions from `mgcv` to create the matrix of smooths used in model fitting.

In trying to construct the matrix myself, I can’t seem to figure out how things are happening under the hood in brms.

An example:

``````library(brms)
library(mgcv)

# Make some data
data <- data.frame(Time = rep(1:100, 4),
Group = rep(factor(LETTERS[1:4]), each = 100),
Y = rnorm(400, 5, 2))

# Generate Stan data from brms
Stan_Data <- make_standata(Y ~ s(Time, bs = "bs", k = 20, by = Group), data = data)

# Look at resulting Xs smooth matrix
Stan_Data[["Xs"]]

# Calculate smooths from mgcv directly
sm <- smoothCon(s(Time, bs = "bs", k = 20, by = Group),
data)

# Look at those results
view(sm)
``````

`brms` appears to be generating 18 values for each group (Zs_Group_1), whereas `mgcv` is giving 20.

Does anyone know where the `Z_Group_1` and the `Xs` matrix are coming from and how I would go about calculating those from the `mgvc` output?

I’ve tried looking through the `brms` source code, but wasn’t able to figure this out.

I think I’m getting a little closer as I am at least seeing the same columns, although in a different order and there are more in the `mgcv` prediction matrix.

``````# Some data
data2 <- data.frame(Time = rep(1:100),
Y = rnorm(100, 5, 2))

# Generate Stan data from brms
Stan_Data2 <- make_standata(Y ~ s(Time, bs = "bs", k = 20), data = data2)

# Make a smooth object from mgcv
sm2 <- smoothCon(s(Time, bs = "bs", k = 20),
data2, absorb.cons = TRUE, modCon = 3, diagonal.penalty = TRUE)[[1]]

# Create a prediction matrix from mgcv
pm <- PredictMat(sm2,
data.frame(Time = data2\$Time))

``````

`pm` has 19 columns, `Zs_1_1` in `Stan_Data2` has 18 columns in a different order from `pm`, but with the same values.

@paul.buerkner , can you offer any insight?

Check out `data_sm` in brms/data-predictor.R at master · paul-buerkner/brms · GitHub

Thanks!

I noticed this section:

``````# data preparation for splines
data_sm <- function(bterms, data, basis = NULL) {
out <- list()
smterms <- all_terms(bterms[["sm"]])
if (!length(smterms)) {
return(out)
}
p <- usc(combine_prefix(bterms))
new <- length(basis) > 0L
if (!new) {
knots <- get_knots(data)
basis <- named_list(smterms)
for (i in seq_along(smterms)) {
# the spline penalty has changed in 2.8.7 (#646)
diagonal.penalty <- !require_old_default("2.8.7")
basis[[i]] <- smoothCon(
eval2(smterms[i]), data = data,
knots = knots, absorb.cons = TRUE,
diagonal.penalty = diagonal.penalty
)
``````

I added the additional arguments to the `mgcv` `smoothCon` call, but still get the discrepancy in the number and order of coefficients.

My goal is to build the same `Zs_1_1` and `Xs` matrices “manually” through `mgcv`, but I’m not sure where the shuffling is occuring

To illustrate:

This is the result of the `PredictMat`:

``````           [,1]         [,2]        [,3]      [,4]      [,5]      [,6]       [,7]
[1,] 0.01552785  0.012754614  0.10415721 0.3574953 0.5955881 0.6067269 -0.2811756
[2,] 0.02369922 -0.004846921  0.07828497 0.3386568 0.6050628 0.6606941 -0.3989015
[3,] 0.03263690 -0.024967703  0.04688324 0.3102596 0.6027531 0.7044077 -0.5102859
[4,] 0.04195323 -0.046801346  0.01111433 0.2736061 0.5897278 0.7384236 -0.6154584
[5,] 0.05126051 -0.069541460 -0.02785943 0.2299988 0.5670560 0.7632978 -0.7145485
[6,] 0.06017108 -0.092381658 -0.06887572 0.1807400 0.5358066 0.7795864 -0.8076860
[,8]      [,9]
[1,] 4.025937 -1.479958
[2,] 3.807920 -1.450060
[3,] 3.592611 -1.420162
[4,] 3.380031 -1.390264
[5,] 3.170200 -1.360366
[6,] 2.963140 -1.330468

``````

and this is Zs_1_1 from the Stan data generated by `brms`:

``````           [,1]     [,2]       [,3]      [,4]      [,5]      [,6]        [,7]
[1,] 0.01552785 4.025937 -0.2811756 0.6067269 0.5955881 0.3574953  0.10415721
[2,] 0.02369922 3.807920 -0.3989015 0.6606941 0.6050628 0.3386568  0.07828497
[3,] 0.03263690 3.592611 -0.5102859 0.7044077 0.6027531 0.3102596  0.04688324
[4,] 0.04195323 3.380031 -0.6154584 0.7384236 0.5897278 0.2736061  0.01111433
[5,] 0.05126051 3.170200 -0.7145485 0.7632978 0.5670560 0.2299988 -0.02785943
[6,] 0.06017108 2.963140 -0.8076860 0.7795864 0.5358066 0.1807400 -0.06887572
[,8]
[1,]  0.012754614
[2,] -0.004846921
[3,] -0.024967703
[4,] -0.046801346
[5,] -0.069541460
[6,] -0.092381658

``````

The last column in the `PredictMat` call isn’t in the `brms` data, which is easy enough to omit. But, the other columns are in a different order

After digging deeper into brms/data-predictor.R at master · paul-buerkner/brms · GitHub ,

I found that Zs_1_1 is `mgcv::smooth2random(sm, names(data), type = 2)[["rand"]][["Xr"]]`
Where `sm` = `smoothCon(s(Time, bs = "bs", k = 20, by = Group), data, absorb.cons = TRUE, modCon = 3, diagonal.penalty = TRUE)[[1]]`