The main question here concerns the most appropriate way to set up varying effect structures with hierarchical gams via brms (which uses mgcv under the hood, I think).
I have read these papers:
https://journals.sagepub.com/doi/full/10.1177/2331216519832483
And these posts:
And from the combination of these papers and posts, I have ended up in a position where I am not entirely certain which of the many varying effects structures I should choose. Furthermore, I don’t know whether or not some approaches result in approximately the same end just by different means. And even more importantly, I don’t know whether the varying structure I may use is actually doing what I think it is doing. I should say at this point that the above papers are tremendously well-written, helpful and informative. The bottleneck is in my brain now that I am translating the many flavours of gam into a reality within brms.
In the below, I will provide some context to the research question and aim, and then list the model formulas that seem possible/sensible (based on my reading and understanding). Maybe someone could chime in with advice?
Context:
I have pupilometry data - i.e., timeseries data of pupil dilation. I want to fit a hierarchical gam, as suggested in the above papers. More specifically, I want to fit a model that has one categorical predictor (a within-participant experimental manipulation) and that also has the maximum varying effects structure that is permitted by the design (following the suggestions of Barr et al., 2013).
Formulas:
A note on abbreviations: time = sampling frequency (xx samples per second); condition = categorical predictor which indexes a within participant experimental manipulation; pid = participant id.
- no varying effects of smooth.
formula = bf(pupil ~ 1 + condition +
s(time, by = condition, bs = “bs”, k = 10))
- add a factor smooth by pid
formula = bf(pupil ~ 1 + condition +
s(time, by = condition, bs = “bs”, k = 10) +
s(time, pid, bs=“fs”, m=1))
- use “re” instead of a factor smooth
formula = bf(pupil ~ 1 + condition +
s(time, by = condition, bs = “bs”, k = 10) +
s(time, pid, bs=“re”))
- use a factor smooth for condition and add a linear varying intercept for pid
formula = bf(pupil ~ 1 + condition +
s(time, by = condition, bs = “bs”, k = 10) +
s(time, pid, bs=“fs”, m=1) +
(1 | pid))
- formula (4) above, but with a varying slope for condition per pid.
formula = bf(pupil ~ 1 + condition +
s(time, by = condition, bs = “bs”, k = 10) +
s(time, pid, bs=“fs”, m=1) +
(1 + condition | pid))
- include a separate smooth per condition and pid
formula = bf(pupil ~ 1 + condition +
s(time, by = condition, bs = “bs”, k = 10) +
s(time, by = pid, bs = “bs”, k = 10) +
(1 + condition | pid))
- add a smooth for the interaction between pid and condition.
formula = bf(pupil ~ 1 + condition +
s(time, by = interaction(pid, condition), bs = “bs”, k = 10) +
(1 + condition | pid))
Does anyone have any advice in general based on these options?
And more specifically, can anyone guide me or give me an intuition on the relationship between linear varying effects (e.g., 1 + condition | pid), which I would typically use in other model fitting exercises, and the varying effects used in gams? Under a “keep it maximal” approach (Barr et al., 2013), do I need to include both? Are they doing different things or are they completely or partially redundant? Apologies in advance if these are half-baked questions. As always, I would really appreciate some advice.
Finally, to provide further context, I should say that I have fit these models to pilot data of 5 participants and performed model comparison via loo. There is clear water between model 1 and the rest of the models, with model 1 having considerably lower predictive accuracy. That makes sense to me as including varying effects of pid makes for better out of sample predictions. The error bars for the rest of the models overlap (models 2-7), but model 7 is the best numerically, but they all seem to do an equally good job.
Anyway, I’m really looking for some principles to guide my choices rather than just doing model comparison. So if anyone has any tips, I would be very happy to receive them. Many thanks in advance.
- Operating System: mac os sonoma 14.5
- brms Version: 2.21.0