Specifying varying effect structures with hierarchical generalised additive models (gams)

Rich · June 21, 2024, 2:21pm

The main question here concerns the most appropriate way to set up varying effect structures with hierarchical gams via brms (which uses mgcv under the hood, I think).

I have read these papers:
https://journals.sagepub.com/doi/full/10.1177/2331216519832483

And these posts:

And from the combination of these papers and posts, I have ended up in a position where I am not entirely certain which of the many varying effects structures I should choose. Furthermore, I don’t know whether or not some approaches result in approximately the same end just by different means. And even more importantly, I don’t know whether the varying structure I may use is actually doing what I think it is doing. I should say at this point that the above papers are tremendously well-written, helpful and informative. The bottleneck is in my brain now that I am translating the many flavours of gam into a reality within brms.

In the below, I will provide some context to the research question and aim, and then list the model formulas that seem possible/sensible (based on my reading and understanding). Maybe someone could chime in with advice?

Context:

I have pupilometry data - i.e., timeseries data of pupil dilation. I want to fit a hierarchical gam, as suggested in the above papers. More specifically, I want to fit a model that has one categorical predictor (a within-participant experimental manipulation) and that also has the maximum varying effects structure that is permitted by the design (following the suggestions of Barr et al., 2013).

Formulas:

A note on abbreviations: time = sampling frequency (xx samples per second); condition = categorical predictor which indexes a within participant experimental manipulation; pid = participant id.

no varying effects of smooth.

formula = bf(pupil ~ 1 + condition +
s(time, by = condition, bs = “bs”, k = 10))

add a factor smooth by pid

formula = bf(pupil ~ 1 + condition +
s(time, by = condition, bs = “bs”, k = 10) +
s(time, pid, bs=“fs”, m=1))

use “re” instead of a factor smooth

formula = bf(pupil ~ 1 + condition +
s(time, by = condition, bs = “bs”, k = 10) +
s(time, pid, bs=“re”))

use a factor smooth for condition and add a linear varying intercept for pid

formula = bf(pupil ~ 1 + condition +
s(time, by = condition, bs = “bs”, k = 10) +
s(time, pid, bs=“fs”, m=1) +
(1 | pid))

formula (4) above, but with a varying slope for condition per pid.

formula = bf(pupil ~ 1 + condition +
s(time, by = condition, bs = “bs”, k = 10) +
s(time, pid, bs=“fs”, m=1) +
(1 + condition | pid))

include a separate smooth per condition and pid

formula = bf(pupil ~ 1 + condition +
s(time, by = condition, bs = “bs”, k = 10) +
s(time, by = pid, bs = “bs”, k = 10) +
(1 + condition | pid))

add a smooth for the interaction between pid and condition.

formula = bf(pupil ~ 1 + condition +
s(time, by = interaction(pid, condition), bs = “bs”, k = 10) +
(1 + condition | pid))

Does anyone have any advice in general based on these options?

And more specifically, can anyone guide me or give me an intuition on the relationship between linear varying effects (e.g., 1 + condition | pid), which I would typically use in other model fitting exercises, and the varying effects used in gams? Under a “keep it maximal” approach (Barr et al., 2013), do I need to include both? Are they doing different things or are they completely or partially redundant? Apologies in advance if these are half-baked questions. As always, I would really appreciate some advice.

Finally, to provide further context, I should say that I have fit these models to pilot data of 5 participants and performed model comparison via loo. There is clear water between model 1 and the rest of the models, with model 1 having considerably lower predictive accuracy. That makes sense to me as including varying effects of pid makes for better out of sample predictions. The error bars for the rest of the models overlap (models 2-7), but model 7 is the best numerically, but they all seem to do an equally good job.

Anyway, I’m really looking for some principles to guide my choices rather than just doing model comparison. So if anyone has any tips, I would be very happy to receive them. Many thanks in advance.

Operating System: mac os sonoma 14.5
brms Version: 2.21.0

ucfagls · June 25, 2024, 9:29am

Some comments:

Don’t ever do this in brms as it will be slow as a very slow thing. If you want random intercepts/slopes use the native syntax: (time + pid).

Note also that s(time, pid, bs=“re”) is only adding random slopes; you likely need s(pid, bs=“re”) + s(time, pid, bs=“re”) to allow random intercepts and slopes in a GAM model if you are doing this via mgcv.

You don’t need the m = 1 on the random smooth terms (bs = "fs") as these smooths are fully penalized. In my experience, what you might gain from m = 1 is often offset by piecewise linear behaviour of the estimated effects because of the change in derivative).

You don’t want a parametric effect and and random effect for condition - use one or the other. As condition seems an effect of interest, taking only several levels, I would use the parametric effect. Also, in this model you don’t actually want the random intercepts per pid as these are already included in the fs smooth and including both could easily cause identifiability issues that would affect the sampling.

My suggestion, if you want to estimate smooths for each condition while accounting for individual time trends, I would fit these models

pupil ~ 1 + condition +
  s(time, by = condition, bs = “bs”, k = 10) +
  s(time, pid, bs = “fs”, xt = list(bs = "bs"))

and

pupil ~ 1 + condition +
  s(time, by = condition, bs = “bs”, k = 10) +
  s(time, pid, by = condition, bs = “fs”, xt = list(bs = "bs"))

(Noting my use of the xt argument to use a B-spline basis for the fs smooths, to match your specification from the other smooth`.)

These two models account for the smooth time-treatment effects through the factor-by smooth of time by condition, with treatment means modelled through the condition parametric effect. The models differ in how they treat (penalize) the subject-specific smooths:

the first form fits a smooth of time for each subject (including random intercepts and linear slopes), where the penalties will shrink the subject specific curves towards their respective group means (the parametric condition effects), while
the second form extends the first form in two ways:
i. the subjects in each treatment group have a common wiggliness, but the wiggliness can vary between treatment groups , and
ii. the penalties will shrink the curve towards their respective treatment-specific smooth

The second form will also be quite a lot more complex to fit however.

Rich · June 25, 2024, 10:00am

Dear ucfagls (aka Gavin),

That is simply AMAZING! Thank you. I cannot tell you how much I appreciate your response.

Thanks so much for taking time out to respond.

Best regards,
Rich

Rich · December 11, 2024, 12:59pm

Dear ucfagls (aka Gavin),

If possible, I have a follow-up question, which concerns how to interpret the output of the first model that you outlined above:

Usually, I would use something like tidybayes::add_epred_draws() and set re_formula = NA, in order to get population-level predictions.

But for the model above, I get an error: add_epred_draws() requires the participant ID variable. And I think this makes sense based on the above model, since a separate smooth is fit for each participant.

Do you have any advice on how I might generate what might be called “group average” / “fixed effects” / “population level” predicitons?

I can of course get predictions for each participant ID and then average over them, but that feels sub-optimal and potentially not what I want to be doing.

Sorry in advance if this is a basic or confused question. I remain very new to hgams, so I wanted to check.

Best regards,
Rich

ucfagls · January 4, 2025, 2:07pm

Sorry for the late response.

I don’t think that this is possible. @paul.buerkner can confirm, but it doesn’t appear that the predict() method (equivalent) in brms has the ability to exclude terms from the linear predictor in the way mgcv does it via it’s exclude and terms arguments. I don’t think that following mgcv would be the right thing to do here for random effects in general — marginalising over the random effects is not the same as setting their effect to zero in models with a non identity link function. But I also don’t think brms automatically understands that terms of the form s(x, f, bs = "fs") are random effects and should be excluded if the appropriate re_formula is used.

I’m basing this on a quick look at the functions involved so I may have this wrong.

Rich · January 6, 2025, 8:46am

Thanks - that’s very helpful.

If it is not currently possible (or even sensible) to set re_formula = NA in brms with these kinds of models, do you or @paul.buerkner have any advice on what might be the best way to estimate population-level predictions in brms (or tidybayes) with these kinds of models?

For example, is averaging across the participant-level posterior preditions valid? If it is not valid or appropriate to do so, would that mean that I could only calculate predictions at the individual partiicpant level?

Thanks again. Any further help would be very much appreicated.

Rich

paul.buerkner · January 7, 2025, 8:40am

Yes, you can use average participant-level posterior predictions. But you need to make sure that averaging is done for the individual posterior draws not for the posterior predictive summary statistics (posterior predictive mean, sd etc).

I think the emmeans package, which is fully compatible with most brms models, may have some support in that regard to make this whole process of estimating marginal means more convenient.

Rich · January 7, 2025, 8:50am

Thanks Paul - your response is very much appreciated.
Rich

Rich · March 18, 2025, 9:39am

Hi Paul,

I have one further direct follow-up question, if that’s ok. It relates to something that @ucfagls mentioned above.

I’d like to use this kind of model to make predictions for future simulated experiments. For instance, let’s say I ran a pilot study with N=5 participants. I’d like to build a model with the pilot data, then use that model to simulate data for datasets with say N=25 participants in each expeirment. The idea might be to get a sense of likely “power” or precision of the estimates if we were to run a larger scale study with more participants. However, for this kind of factor by smooth model, I get an error whereby it does not allow new levels of particiapnt ID. I assume this issue is related to the comment made by Gavin above regarding the treatment of varying effects in these kinds of models. However, I am not sure, so I am reaching our for advice and guidance. Is it possible to set new levels of participant ID in these kinds of models? The usual allow_new_levels() arguement does not work.

As always, any help or advice would be very much appreicated. And if you need further information then please let me know. And it is possible that I have confused one or more aspects of this approach, so apologies in advance if that is the case.

The model formula, code to generate predictions and the error message it returns are copied below, in case they are helpful.

Best regards,
Rich

This is the model formula:

pupil ~ 1 + condition +
  s(time, by = condition, bs = “bs”, k = 10) +
  s(time, pid, bs = “fs”, xt = list(bs = "bs"))

And this is the code that I might use to generate predictions for one exp:

## nd = new data
nd <- crossing(pid = factor(1:25),
               condition = c("slow", "medium", "fast"),
               time = seq(0, 29800, 200)) %>% 
  mutate(condition = factor(condition,
                            levels = c("slow", "medium", "fast")))
head(nd)

sim_data <- add_predicted_draws(newdata = nd,
                                object = model) 
head(sim_data)

and this is the error that it returns:

Error: New factor levels are not allowed.
Levels allowed: ‘1’, ‘2’, ‘3’, ‘4’, ‘5’
Levels found: ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’, ‘8’, ‘9’, ‘10’, ‘11’, ‘12’, ‘13’, ‘14’, ‘15’, ‘16’, ‘17’, ‘18’, ‘19’, ‘20’, ‘21’, ‘22’, ‘23’, ‘24’, ‘25’

Topic		Replies	Views
Building a multilevel GAM model with multiple factors in brms Modeling splines , brms	8	1677	November 15, 2022
Fitting mixed effect GAM model in brms and setting of priors General brms	3	1114	August 9, 2023
Binomial hierarchical gam brms	3	1392	May 10, 2018
Fixing smooth terms by supplying estimates from previous runs or external GAM Modeling brms	5	622	September 29, 2022
Hierarchical model 101: Link, categorical without baseline, posterior predictive checks brms bayesplot	10	2251	October 23, 2018

Specifying varying effect structures with hierarchical generalised additive models (gams)

Related topics