Hello! First post here, hoping somebody can point me in the right direction. I’m new to GAMs, so please forgive any silly mistakes.
To explain the model I’m trying to fit, I should first explain the data and my general approach to modeling it. The data comes from a recognition memory experiment wherein items were studied under one of three conditions. At test, participants saw the studied words intermixed with foil items and made a binary judgement as to whether each word was old or new. For this model, I’m working with four variables. The dependent variable is the participants’ response, which is categorical with two levels, either old or new. In general, I’m modeling responses as a function of study condition (categorical with four levels, one for each condition and one level for foils) using probit regression to estimate sensitivity and response bias (with random slopes for participants and items), akin to the SDT models described here. To be thorough, the GLMM is specified like this:
bf(formula = response~condition + (condition|participant) + (condition|words),
family = bernoulli(link = 'probit'))
Note that condition is ordered such that foil items serve as the reference level.
However, I’m also interested in looking at the effect of standardized study trial number (the order in which the items appeared at study, akin to serial position) on sensitivity. I have theoretical reason to believe that in our experiment, this relationship will be non-linear, and visualization of the data supports this. Accordingly, I’d like to fit a comparable GAMM that includes a smooth term for trial number across conditions. I also want to include a random smooth corresponding to the effect of trial number across participants. Based on the mgcv documentation, I can use bs = 'fs'
to get random wiggly curves for each participant. Thus, it seems I could specify the model like this:
bf(formula = response~condition + s(trials_scaled, by = condition)
+ s(trials_scaled, participant, bs = 'fs', m=1),
family = bernoulli(link = 'probit'))
However, as I understand it, specifying the model like this would assume that each participant has the same curve at each level of condition. This is not what I want - I want each participant to have a different curve at different levels of condition. Essentially, I want the random smooth equivalent of what you could get using a linear random effect like (0 + condition:trials_scaled|participant)
. Using bs = 'fs'
will apparently only accommodate one factor. One idea I had was to combine this with by = fac
, doing something like this:
s(trials_scaled, participant, bs = 'fs', m=1, by=condition)
However, I’m really not sure if this would do what I want it to do - I can’t find any examples that have combined bs = 'fs'
with by = fac
(which might also be a red flag indicating that this isn’t the right way to do it). Fitting the model with the above smooth term and plotting it seems to suggest that it is indeed fitting different curves for each participant across levels of condition, but I’m not sure if it’s being penalized correctly (or if other things are wrong). The closest example I could find of someone wanting to do something similar was on this StackExchange thread, where the answer suggested using tensor product smooths. However, the user there wanted a random effect corresponding to the interaction between two continuous variables - rather than a continuous and categorical variable, like I have here - so I’m not sure if this is applicable to my dilemma.
Hopefully my post was clear and my objective makes sense. I would greatly appreciate any guidance on how I can incorporate a smooth term as described into my model.