Specifying varying effect structures with hierarchical generalised additive models (gams)

The main question here concerns the most appropriate way to set up varying effect structures with hierarchical gams via brms (which uses mgcv under the hood, I think).

I have read these papers:
https://journals.sagepub.com/doi/full/10.1177/2331216519832483

And these posts:

And from the combination of these papers and posts, I have ended up in a position where I am not entirely certain which of the many varying effects structures I should choose. Furthermore, I don’t know whether or not some approaches result in approximately the same end just by different means. And even more importantly, I don’t know whether the varying structure I may use is actually doing what I think it is doing. I should say at this point that the above papers are tremendously well-written, helpful and informative. The bottleneck is in my brain now that I am translating the many flavours of gam into a reality within brms.

In the below, I will provide some context to the research question and aim, and then list the model formulas that seem possible/sensible (based on my reading and understanding). Maybe someone could chime in with advice?

Context:

I have pupilometry data - i.e., timeseries data of pupil dilation. I want to fit a hierarchical gam, as suggested in the above papers. More specifically, I want to fit a model that has one categorical predictor (a within-participant experimental manipulation) and that also has the maximum varying effects structure that is permitted by the design (following the suggestions of Barr et al., 2013).

Formulas:

A note on abbreviations: time = sampling frequency (xx samples per second); condition = categorical predictor which indexes a within participant experimental manipulation; pid = participant id.

  1. no varying effects of smooth.

formula = bf(pupil ~ 1 + condition +
s(time, by = condition, bs = “bs”, k = 10))

  1. add a factor smooth by pid

formula = bf(pupil ~ 1 + condition +
s(time, by = condition, bs = “bs”, k = 10) +
s(time, pid, bs=“fs”, m=1))

  1. use “re” instead of a factor smooth

formula = bf(pupil ~ 1 + condition +
s(time, by = condition, bs = “bs”, k = 10) +
s(time, pid, bs=“re”))

  1. use a factor smooth for condition and add a linear varying intercept for pid

formula = bf(pupil ~ 1 + condition +
s(time, by = condition, bs = “bs”, k = 10) +
s(time, pid, bs=“fs”, m=1) +
(1 | pid))

  1. formula (4) above, but with a varying slope for condition per pid.

formula = bf(pupil ~ 1 + condition +
s(time, by = condition, bs = “bs”, k = 10) +
s(time, pid, bs=“fs”, m=1) +
(1 + condition | pid))

  1. include a separate smooth per condition and pid

formula = bf(pupil ~ 1 + condition +
s(time, by = condition, bs = “bs”, k = 10) +
s(time, by = pid, bs = “bs”, k = 10) +
(1 + condition | pid))

  1. add a smooth for the interaction between pid and condition.

formula = bf(pupil ~ 1 + condition +
s(time, by = interaction(pid, condition), bs = “bs”, k = 10) +
(1 + condition | pid))

Does anyone have any advice in general based on these options?

And more specifically, can anyone guide me or give me an intuition on the relationship between linear varying effects (e.g., 1 + condition | pid), which I would typically use in other model fitting exercises, and the varying effects used in gams? Under a “keep it maximal” approach (Barr et al., 2013), do I need to include both? Are they doing different things or are they completely or partially redundant? Apologies in advance if these are half-baked questions. As always, I would really appreciate some advice.

Finally, to provide further context, I should say that I have fit these models to pilot data of 5 participants and performed model comparison via loo. There is clear water between model 1 and the rest of the models, with model 1 having considerably lower predictive accuracy. That makes sense to me as including varying effects of pid makes for better out of sample predictions. The error bars for the rest of the models overlap (models 2-7), but model 7 is the best numerically, but they all seem to do an equally good job.

Anyway, I’m really looking for some principles to guide my choices rather than just doing model comparison. So if anyone has any tips, I would be very happy to receive them. Many thanks in advance.

  • Operating System: mac os sonoma 14.5
  • brms Version: 2.21.0
1 Like

Some comments:

Don’t ever do this in brms as it will be slow as a very slow thing. If you want random intercepts/slopes use the native syntax: (time + pid).

Note also that s(time, pid, bs=“re”) is only adding random slopes; you likely need s(pid, bs=“re”) + s(time, pid, bs=“re”) to allow random intercepts and slopes in a GAM model if you are doing this via mgcv.

You don’t need the m = 1 on the random smooth terms (bs = "fs") as these smooths are fully penalized. In my experience, what you might gain from m = 1 is often offset by piecewise linear behaviour of the estimated effects because of the change in derivative).

You don’t want a parametric effect and and random effect for condition - use one or the other. As condition seems an effect of interest, taking only several levels, I would use the parametric effect. Also, in this model you don’t actually want the random intercepts per pid as these are already included in the fs smooth and including both could easily cause identifiability issues that would affect the sampling.

My suggestion, if you want to estimate smooths for each condition while accounting for individual time trends, I would fit these models

pupil ~ 1 + condition +
  s(time, by = condition, bs = “bs”, k = 10) +
  s(time, pid, bs = “fs”, xt = list(bs = "bs"))

and

pupil ~ 1 + condition +
  s(time, by = condition, bs = “bs”, k = 10) +
  s(time, pid, by = condition, bs = “fs”, xt = list(bs = "bs"))

(Noting my use of the xt argument to use a B-spline basis for the fs smooths, to match your specification from the other smooth`.)

These two models account for the smooth time-treatment effects through the factor-by smooth of time by condition, with treatment means modelled through the condition parametric effect. The models differ in how they treat (penalize) the subject-specific smooths:

  1. the first form fits a smooth of time for each subject (including random intercepts and linear slopes), where the penalties will shrink the subject specific curves towards their respective group means (the parametric condition effects), while
  2. the second form extends the first form in two ways:
    i. the subjects in each treatment group have a common wiggliness, but the wiggliness can vary between treatment groups , and
    ii. the penalties will shrink the curve towards their respective treatment-specific smooth

The second form will also be quite a lot more complex to fit however.

10 Likes

Dear ucfagls (aka Gavin),

That is simply AMAZING! Thank you. I cannot tell you how much I appreciate your response.

Thanks so much for taking time out to respond.

Best regards,
Rich

3 Likes

Dear ucfagls (aka Gavin),

If possible, I have a follow-up question, which concerns how to interpret the output of the first model that you outlined above:

Usually, I would use something like tidybayes::add_epred_draws() and set re_formula = NA, in order to get population-level predictions.

But for the model above, I get an error: add_epred_draws() requires the participant ID variable. And I think this makes sense based on the above model, since a separate smooth is fit for each participant.

Do you have any advice on how I might generate what might be called “group average” / “fixed effects” / “population level” predicitons?

I can of course get predictions for each participant ID and then average over them, but that feels sub-optimal and potentially not what I want to be doing.

Sorry in advance if this is a basic or confused question. I remain very new to hgams, so I wanted to check.

Best regards,
Rich

Sorry for the late response.

I don’t think that this is possible. @paul.buerkner can confirm, but it doesn’t appear that the predict() method (equivalent) in brms has the ability to exclude terms from the linear predictor in the way mgcv does it via it’s exclude and terms arguments. I don’t think that following mgcv would be the right thing to do here for random effects in general — marginalising over the random effects is not the same as setting their effect to zero in models with a non identity link function. But I also don’t think brms automatically understands that terms of the form s(x, f, bs = "fs") are random effects and should be excluded if the appropriate re_formula is used.

I’m basing this on a quick look at the functions involved so I may have this wrong.

1 Like

Thanks - that’s very helpful.

If it is not currently possible (or even sensible) to set re_formula = NA in brms with these kinds of models, do you or @paul.buerkner have any advice on what might be the best way to estimate population-level predictions in brms (or tidybayes) with these kinds of models?

For example, is averaging across the participant-level posterior preditions valid? If it is not valid or appropriate to do so, would that mean that I could only calculate predictions at the individual partiicpant level?

Thanks again. Any further help would be very much appreicated.

Rich

1 Like

Yes, you can use average participant-level posterior predictions. But you need to make sure that averaging is done for the individual posterior draws not for the posterior predictive summary statistics (posterior predictive mean, sd etc).

I think the emmeans package, which is fully compatible with most brms models, may have some support in that regard to make this whole process of estimating marginal means more convenient.

1 Like

Thanks Paul - your response is very much appreciated.
Rich