Specifying GAMM with a random effect for the interaction between two variables

Hello! First post here, hoping somebody can point me in the right direction. I’m new to GAMs, so please forgive any silly mistakes.

To explain the model I’m trying to fit, I should first explain the data and my general approach to modeling it. The data comes from a recognition memory experiment wherein items were studied under one of three conditions. At test, participants saw the studied words intermixed with foil items and made a binary judgement as to whether each word was old or new. For this model, I’m working with four variables. The dependent variable is the participants’ response, which is categorical with two levels, either old or new. In general, I’m modeling responses as a function of study condition (categorical with four levels, one for each condition and one level for foils) using probit regression to estimate sensitivity and response bias (with random slopes for participants and items), akin to the SDT models described here. To be thorough, the GLMM is specified like this:

bf(formula = response~condition + (condition|participant) + (condition|words),
   family = bernoulli(link = 'probit'))

Note that condition is ordered such that foil items serve as the reference level.

However, I’m also interested in looking at the effect of standardized study trial number (the order in which the items appeared at study, akin to serial position) on sensitivity. I have theoretical reason to believe that in our experiment, this relationship will be non-linear, and visualization of the data supports this. Accordingly, I’d like to fit a comparable GAMM that includes a smooth term for trial number across conditions. I also want to include a random smooth corresponding to the effect of trial number across participants. Based on the mgcv documentation, I can use bs = 'fs' to get random wiggly curves for each participant. Thus, it seems I could specify the model like this:

bf(formula = response~condition + s(trials_scaled, by = condition) 
   + s(trials_scaled, participant, bs = 'fs', m=1),
   family = bernoulli(link = 'probit'))

However, as I understand it, specifying the model like this would assume that each participant has the same curve at each level of condition. This is not what I want - I want each participant to have a different curve at different levels of condition. Essentially, I want the random smooth equivalent of what you could get using a linear random effect like (0 + condition:trials_scaled|participant). Using bs = 'fs' will apparently only accommodate one factor. One idea I had was to combine this with by = fac, doing something like this:

s(trials_scaled, participant, bs = 'fs', m=1, by=condition)

However, I’m really not sure if this would do what I want it to do - I can’t find any examples that have combined bs = 'fs' with by = fac (which might also be a red flag indicating that this isn’t the right way to do it). Fitting the model with the above smooth term and plotting it seems to suggest that it is indeed fitting different curves for each participant across levels of condition, but I’m not sure if it’s being penalized correctly (or if other things are wrong). The closest example I could find of someone wanting to do something similar was on this StackExchange thread, where the answer suggested using tensor product smooths. However, the user there wanted a random effect corresponding to the interaction between two continuous variables - rather than a continuous and categorical variable, like I have here - so I’m not sure if this is applicable to my dilemma.

Hopefully my post was clear and my objective makes sense. I would greatly appreciate any guidance on how I can incorporate a smooth term as described into my model.

I don’t know enough about bs = "fs" to give an authoritative answer, but I’m pretty confident that, assuming bs = "fs" does what you want, that combining it with the by argument would yield whatever bs = fs does fitted independently for each level of condition.

For what it’s worth, the thing that seems a bit strange about this model to me is that as constructed, you assume that respondents evaluate foil items the same way regardless of which treatment they experienced. Wouldn’t you think that respondents’ error rates when classifying foil words might depend on which treatment they were subject to? I would think you would want your fixed effect structure to include treatment*novelty, where novelty is a binary indicator of whether a given word was old or new, yielding six levels rather than the four levels you have in your condition variable as described.

1 Like

Hey @jsocolar, I really appreciate the reply! Plotting the curves does seem to indicate that this may indeed be the right solution, but it is difficult to tell.

Regarding your other point, I understand what you’re saying. However, I can’t specify the model as you described because the treatment was manipulated within-subject. Because the treatment had to do with how participants studied the items (rather than characteristics inherent to the items themselves), I cannot separate the false alarm rate across conditions, even though this seems like it would be better. In another experiment from the same investigation, treatment was manipulated between-subject; for this experiment, I specify the models the way you described.

1 Like