Simplifying two regression into one with within and between subject factors

Dear all,

I’m trying to simplify my data analysis, sorry this is a silly issue, but I just can’t quite decide what is the right thing to do.

My data is: ~400 participants, each providing ~150 behavioural measurements (within subject), as well as psychological questionnaire measures (2 per subject) What I’ve done previously:
a) regression predicting behaviour, like so: behaviour ~ 1+ reg1 + (1+reg1|subj)
b) show a moderation effect, i.e. that the impact of reg 1 on behaviour is moderated by the two questionnaire measures, like so: reg 1 ~ mo(psych 1) *psych 2 (no longer hierarchical as now I just have one row per subject; I’ve coded psych1 as ordered factor as there are only 6 possible answers)

I’ve been trying to simplify this to one single analysis, but I’m not quite sure which one of several is the right way to do it:
a) behaviour ~ 1+ reg1*mo(psych1)psych2 + (1+reg1|subj)
b) behaviour ~ 1+ reg1
mo(psych1)psych2 + (1+reg1mo(psych1)*psych2|subj) - given that I only have one measurement of psych1/2 per subject, this is probably not right?
c) behaviour ~ 1+ reg1 + (1+reg1|subj) + (1+reg1|psych1:psych2) - given that there are many levels of psych1/2, this is probably not right?

Relatedly, for a) I now get a hard to interpret 3-way interaction term that I thought I could clarify using marginal plots like so:

t=make_conditions(myData,"Psych1")
marginal_effects(myFit,effects="Reg1:Psych2",conditions=t

But it tells me "Error: monotonic predictors must be integers or ordered factors. Error occurred for variable ‘Psych1’ "

I’d be very grateful for any advice.
Jacquie

I always start by saying I’m no expert (except the massive amount of time I’ve spent learning to properly understand my own problematic mixed models).

SO, if someone smarter comes along who better understands your needs that’s great. In the meantime.

I’m not sure I understand your predictors and your response measures.

if you could provide a tiny bit of data structure that helps in providing guidance about model building.

are there observations that represent values for reg1

how are the behavioral measures collected (are they all the same measure or do they represent mu 150 types of measurements, or 10 types of measures collected 15 times per subject. Etc.

Especially in mixed / hierarchical / multilevel models (I always use that phrasing because to me they are all variations of the same beast but not everyone reads them that way) it is especially helpful to know how the data relate to each other.

Grabbing the head of your df and pasting the pertinent columns with enough rows to see the structure and a tiny bit more explanation is good enough for me.

I should be working on my own stuff right now for a meeting on Friday (and I stole late night hours away from kids and spouse to do it) but instead I did this! :) So keep that in mind when waiting for help. While waiting, look for similar questions here, on stack overflow/ cross validated (for the life of me I can never remember which one is for general statistics questions and which is for R coding problems!). Just browsing mixed models there can be a learning experience and if nothing helps ask other places too.

Thanks @MegBallard for taking the time to look at this. My format is:

Behav Reg1 Psych1 Psych2 ID
1 -1 5 1 1
0 5 5 1 1
1 -3 5 1 1
0 4 5 1 1
0 2 5 1 1
0 -4 2 0 1
1 7 2 0 1
1 3 2 0 1
1 -2 2 0 1
1 1 2 0 1

(I’ve just included ‘Reg1’ to simplify, there are more, but Reg1 is the one for which I have done the moderation analysis…)

Data format: 400 participants, 150 measures per participant and each of these measures has a different value for Reg1, Reg2 etc. The task is basically a human decision making task with 150 trials and each of the factors (reg1 etc.) that should impact behaviour are varied systematically across the trials (and then we measure the behaviour).

So for the hierarchical model by itself, it seems quite straightforward in BRMS:

myFit = bro(formula=Behav ~ 1 + Reg1 + Reg2... + (1+Reg1+... | ID),
data=myData, family='bernoulli')

And what I have done in the past is take each person’s measure of regression weight for reg 1 :

coef(myModel)$ID

And then include this in a new model with a new data frame:

b_Reg1 Psych1 Psych2 ID
1 1 1 1
0.4 4 25 2
-2 2 50 3
-0.2 3 2 4
myMod= brm(formula=b_Reg1~ 1+mo(Psych1)*Psych2),
data=myData2,family='gaussian')

And what I find from this is that Psych1 or Psych2 don’t by themselves relate to b_Reg1, but their interaction does, so I think this is a ‘moderation effect’ (and then I’ve done some plots to understand it better).

And my question now is whether it would be a ‘neater’ model to directly include this moderation effect in the model of behaviour (‘myFit’ above), rather than running two models. And in my original post above I put 3 ways that I could imagine doing this. First I only thought that this would be the natural one to do it:

brm(Behav ~ 1+ reg1*mo(psych1)*psych2 + reg2 etc. (1+reg1+reg2...|ID)

but then I saw this great introductory paper: https://jshd.pubs.asha.org/doi/abs/10.1044/2018_JSLHR-S-18-0006 ( @Ladislas) and got confused whether after all this was the right way to do it … as they seem to have something a bit similar, but end up with a format like this:

distance ~ gender + (1|subj) + (1|vowel),

rather than

distance ~ 1+ gender + vowel + (1+vowel | subject)