How does one interpret the output of using a s() term in brms?

I am trying to get a handle on using s() terms in brms. I have read around the forum and various helpful places like Gavin SImpson’s blog, but I am still having trouble understanding the output from a model in brms where I use an s() term.
I understand that the sds() is a varying effect that controls the ‘wiggliness’ of the spline. The s term in the ‘Population-level’ effects is the completely smooth part. But what does this mean?? Is there any directly interpretable meaning from the numbers in the model output?
Below is a very simple example.

time<-seq(from=0, to=10, by=0.1)
a <- 10
b <- -1
mu = a + b*time^3
outcome <- rnorm(101, mu, 40)
data <- cbind(time, outcome)
data <- data.frame(data)

plot(outcome ~ time, data=data)


m.s <- brm(outcome ~ s(time, k=20), data=data, cores=4)
m.s

Here is the resulting output in case you don’t want to run the code for yourself, along with some plots of the data and conditional_smooths() from the model fit.
spline model output


‘stime_1’ estimate is -3300. Does this have any interpretable meaning just by looking at this output?
‘sds(stime_1)’ estimate is 302, which I take as the sd of the varying effects for the smooth term. Seems wiggly, but how do I interpret 302?

Also, I have been unable to find the interpretation of the y-axis for conditional_smooths(). If one uses conditional_effects() then the fit is on the scale of the data, but I am not sure how to interpret conditional_smooths().

I could estimate the slope (first derivative) of the spline along points in time using a finite differences approach using fitted() and some new data… but are plots, predictions, and derivatives the only meaningful way to interpret these? Or is the model output itself interpretable too?

Thanks so much! I realize these may be dumb, basic and obvious questions. Hopefully they are not redundant to other posts on the forum, but I think it would be helpful to others if these questions were answered in a single spot (maybe they are somewhere obvious and I haven’t found them!).

5 Likes

I found this helpful article, so I will try to answer this myself. Hopefully someone like @ucfagls will chime in if I am wrong.

sds(stime_1) is the sd of the smooth weights (spline coefficients). This determines the amount of ‘wiggliness’, in an analogous way to how the sd of group-level effects in a varying slopes and intercepts model determine the amount of variability among groups in slopes and intercepts. However, the actual numeric value of the sds() is not very practically interpretable, because thinking about the variance of smooth weights for any given data and model seems abstract to me. However, if the
value is around zero, then this is like ‘complete-pooling’ of the basis functions, which means that there isn’t much added value of more than a single basis function.

stime_1 is the unpenalized weight (ie coefficient) for one of the “natural” parameterized basis functions. The rest of the basis functions are like varying effects. Again, because the actual numeric value of stimes_1 is the value for the unpenalized coefficient for one of the basis functions, this wouldn’t seem to have a lot of practically interpretable meaning just from viewing this number.

Follow-up question - what if I have longitudinal data for 300 participants, with multiple data points per participant, where I would like to fit a spline. Does doing something like brm(outcome ~ s(time) + (time|id) make any sense? If so, how does the varying linear slope for time relate to the s(time)? I would like a varying slopes model, but if it isn’t linear, is there a varying splines model? I don’t think (s(time)|id) can be fit in brms, if I remember, and I am also not sure it makes much sense…

Again, any help is appreciated, and hopefully I have answered my original question correctly and don’t lead anyone else astray! Please correct if I am wrong.

Edit - also, still not sure about the y-axis on the conditional_smooths() plot…

10 Likes

In answer to the follow-up question, I found this helpful paper. On page 10 it mentions that for the varying slope case “the same curve is essentially rotated and stretched to match the actual trajectories”. I believe this is done by simply adding (time|id) to the model with the smooth, like brm(outcome ~ s(time) + (time|id). I played around with it on some data, and the quote above from the paper matches my intuition after predicting and plotting fit for several different subjects. You can also do ‘varying smooths’ as I describe in my question using bs=“fs”. However, this doesn’t seem very practical with several hundred participants as it is exceedingly slow to fit.

Hopefully all of this is on track!

Now if anyone can answer my question about the y-axis scale on conditional_smooths() plots in brms, then I think everything will be answered!

3 Likes

Originally, I did not think that this post helped answer my question about the y-axis scale on conditional_smooths(), until I realized the part where @paul.buerkner says “conditional_effects also adds the intercept”, implying that conditional_smooths() leaves it out. In my example above, the model is guassian, so I couldn’t figure out why conditional_smooths() differed from conditional_effects(), since the identity link is used (conditional_smooths() also shows things on the link scale, which differs from conditional_effects()). If I add the intercept from the model to the conditional_smooths() plot, then both plots look essentially the same (at least just eye-balling it).

I guess this makes me wonder why I would use conditional_smooths() when I can use conditional_effects(), which seems easier to interpret…? Maybe it is more useful in the case when there are multiple s() terms in the model.

1 Like

Hey @jd_c . Your post really helped me better understand my own models.

It also made me think about them and their distinctions, and I am wondering if you could provide your input concerning a question I have.

I have these two GAMMs, and the question I have is whether controlling for the covariate Zprey_speed changes the individual curves (i.e. the relationship between y and x for every individual).

Here is the model summary for a model where I do not include the covariate Zprey_speed :

r$> summary(mod2)
 Family: beta_binomial2 
  Links: mu = logit; phi = identity 
Formula: hunting_success | vint(4) ~ s(Zcumul_xp) + s(Zcumul_xp, predator_id, bs = "fs") + Zgame_du 
   Data: data (Number of observations: 100411) 
  Draws: 4 chains, each with iter = 1500; warmup = 500; thin = 4;
         total post-warmup draws = 1000

Smooth Terms:
                             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sds(sZcumul_xp_1)                0.23      0.20     0.01     0.73 1.00      484      638
sds(sZcumul_xppredator_id_1)     1.29      0.09     1.13     1.46 1.00      603      672
sds(sZcumul_xppredator_id_2)     9.86      0.46     8.99    10.83 1.01      350      560
sds(sZcumul_xppredator_id_3)     2.59      0.17     2.27     2.93 1.01      775      951

Population-Level Effects:
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept          0.07      0.04    -0.01     0.14 1.01      163      348
Zgame_duration     0.60      0.01     0.59     0.61 1.00      900      830
sZcumul_xp_1       0.42      0.40    -0.22     1.38 1.00      810      936

Family Specific Parameters:
    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
phi     2.19      0.02     2.15     2.23 1.00      963      955

Here is the plot of the individual curves :
image

Here is the summary of the second model where I control for Zprey_speed :

r$> summary(mod2p)
 Family: beta_binomial2 
  Links: mu = logit; phi = identity 
Formula: hunting_success | vint(4) ~ s(Zcumul_xp) + s(Zcumul_xp, predator_id, bs = "fs") + Zgame_du 
   Data: data (Number of observations: 100411) 
  Draws: 4 chains, each with iter = 1500; warmup = 500; thin = 4;
         total post-warmup draws = 1000

Smooth Terms:
                             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sds(sZcumul_xp_1)                0.73      0.20     0.38     1.17 1.00      914      822
sds(sZcumul_xppredator_id_1)     1.51      0.09     1.33     1.70 1.00      551      749
sds(sZcumul_xppredator_id_2)     2.96      0.15     2.66     3.27 1.00      670      819
sds(sZcumul_xppredator_id_3)     7.75      0.21     7.35     8.17 1.00      485      712

Population-Level Effects:
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept          0.06      0.03    -0.00     0.12 1.02      127      246
Zgame_duration     0.62      0.01     0.61     0.63 1.01      728      881
Zprey_speed       -0.63      0.01    -0.64    -0.62 1.00     1020      754
sZcumul_xp_1       2.56      0.64     1.34     3.88 1.00      847      884

Family Specific Parameters:
    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
phi     2.93      0.03     2.87     2.98 1.00      813      732

And here is the plot of the individual curves :
image

We see that the curves in the second plot appear to have been shifted upwards towards values of experience falling between 100 and 300. So just by visually inspecting the plot, it appears that controlling for Zprey_speed has a certain effect on the relationships.

However, I am aware that interpreting sds parameters is not really “doable”. But I see in my summaries that there are 3 sds parameters related to the random effect, which are sds(sZcumul_xppredator_id_1), sds(sZcumul_xppredator_id_2) and sds(sZcumul_xppredator_id_3). The three have different values when comparing both models.

So my my question to you is do you know what does the first, second and third sds parameters mean? Is this related to some “derivatives” or “sections” of the curves? For instance, the higher values in the second sds parameter for the first model make me believe that the variance in wiggliness is greater in the first model around the center of the curves, but maybe I am totally wrong. What do you think?

Kind regards!

Maxime

I believe you can think of the sds terms in a way analogous to the sd terms in a varying slopes or intercepts model, where the sds is the sd of the smooth weights, which determines the amount of ‘wiggliness’ of the spline. So increasing sds should mean increasing wiggliness.

If I am not mistaken, the 3 sds parameters that you see in both of your models are the sds for each smooth for each of the predator_id’s that are in your data. You specified s(Zcumul_xp, predator_id, bs = "fs"), which I think gives a different smooth for each level of predator_id. It appears that there are 3 levels of predator_id, so you get 3 different sds terms.
You should be able to run conditional_smooths() for both models and see how the smooths change for each predator_id between the two models.

Thank you for your answer @jd_c.

I wonder if that is true though because I actually have 253 levels of predator_id as you can see in the plots that I provided, so I am not sure why brms would provide only 3 levels of the factor in the summary output. I used conditional_effects for the plots because I needed to control for the prey speed.

Maybe looking at the draws using as_draws() could help.

Best,

Maxime

Maybe that’s not the case for using bs="fs" then! I haven’t used that, because it always took too long to fit, so maybe I am wrong about my response to your question. You may have to look at as_draws() like you say or the Stan model itself.

I tried playing around with it on a small simulation. So, if you use by=fac then you would get a different sds for each level of the factor as I described. But bs='fs' share a single smoothness parameter. Sort of like varying slopes.
I am not sure why there are 3 terms, though. As I said, I really haven’t used a model with factor smooths. Perhaps @ucfagls @paul.buerkner or @tjmahr could answer your question.

The "fs" basis is a fully penalized spline, with potentially different coefficients for the basis functions for each individual / subject. Importantly, the smooth includes penalties on the functions in the null space of the wiggliness penalty, and as a result it isn’t subject to identifiability constraints. What this boils down to is that this smooth contains:

  1. a random intercept per subject,
  2. a random slope (linear effect) of the covariate per subject, and
  3. a random smooth of the covariate per subject

This is different to usual smooths in mgcv, which typically have a sum-to-zero constraint applied that removes the constant function from the span of functions in the null space. In addition, the more typical smooths have an unpenalized null space, meaning that the linear function, which has 0 second derivative and hence isn’t affected by the wiggliness penalty, is not penalized. The above assumes the default second derivative wiggliness penalty - the principles are the same but details differ if you are using a different derivative for the wiggliness penalty.

Because there are three distinct components in the "fs" smooth, we need three distinct penalties:

  1. a diagonal identity matrix penalty for the random intercepts, and
  2. a diagonal identity matrix penalty for the random slopes (linear effects), and
  3. a ridge penalty for the wiggliness of the random smooths.

As there are three penalties in this basis, we need three variance terms associated with smooths in the brms model.

5 Likes

Thank you very much for the detailed and really helpful answer! I think it adds useful information to the questions raised in this thread.

Best

Maxime

1 Like