# Simplifying two regression into one with within and between subject factors

Dear all,

I’m trying to simplify my data analysis, sorry this is a silly issue, but I just can’t quite decide what is the right thing to do.

My data is: ~400 participants, each providing ~150 behavioural measurements (within subject), as well as psychological questionnaire measures (2 per subject) What I’ve done previously:
a) regression predicting behaviour, like so: behaviour ~ 1+ reg1 + (1+reg1|subj)
b) show a moderation effect, i.e. that the impact of reg 1 on behaviour is moderated by the two questionnaire measures, like so: reg 1 ~ mo(psych 1) *psych 2 (no longer hierarchical as now I just have one row per subject; I’ve coded psych1 as ordered factor as there are only 6 possible answers)

I’ve been trying to simplify this to one single analysis, but I’m not quite sure which one of several is the right way to do it:
a) behaviour ~ 1+ reg1*mo(psych1)psych2 + (1+reg1|subj)
b) behaviour ~ 1+ reg1
mo(psych1)psych2 + (1+reg1mo(psych1)*psych2|subj) - given that I only have one measurement of psych1/2 per subject, this is probably not right?
c) behaviour ~ 1+ reg1 + (1+reg1|subj) + (1+reg1|psych1:psych2) - given that there are many levels of psych1/2, this is probably not right?

Relatedly, for a) I now get a hard to interpret 3-way interaction term that I thought I could clarify using marginal plots like so:

``````t=make_conditions(myData,"Psych1")
marginal_effects(myFit,effects="Reg1:Psych2",conditions=t
``````

But it tells me "Error: monotonic predictors must be integers or ordered factors. Error occurred for variable ‘Psych1’ "

I’d be very grateful for any advice.
Jacquie

I always start by saying I’m no expert (except the massive amount of time I’ve spent learning to properly understand my own problematic mixed models).

SO, if someone smarter comes along who better understands your needs that’s great. In the meantime.

if you could provide a tiny bit of data structure that helps in providing guidance about model building.

are there observations that represent values for reg1

how are the behavioral measures collected (are they all the same measure or do they represent mu 150 types of measurements, or 10 types of measures collected 15 times per subject. Etc.

Especially in mixed / hierarchical / multilevel models (I always use that phrasing because to me they are all variations of the same beast but not everyone reads them that way) it is especially helpful to know how the data relate to each other.

Grabbing the head of your df and pasting the pertinent columns with enough rows to see the structure and a tiny bit more explanation is good enough for me.

I should be working on my own stuff right now for a meeting on Friday (and I stole late night hours away from kids and spouse to do it) but instead I did this! :) So keep that in mind when waiting for help. While waiting, look for similar questions here, on stack overflow/ cross validated (for the life of me I can never remember which one is for general statistics questions and which is for R coding problems!). Just browsing mixed models there can be a learning experience and if nothing helps ask other places too.

Thanks @MegBallard for taking the time to look at this. My format is:

Behav Reg1 Psych1 Psych2 ID
1 -1 5 1 1
0 5 5 1 1
1 -3 5 1 1
0 4 5 1 1
0 2 5 1 1
0 -4 2 0 1
1 7 2 0 1
1 3 2 0 1
1 -2 2 0 1
1 1 2 0 1

(I’ve just included ‘Reg1’ to simplify, there are more, but Reg1 is the one for which I have done the moderation analysis…)

Data format: 400 participants, 150 measures per participant and each of these measures has a different value for Reg1, Reg2 etc. The task is basically a human decision making task with 150 trials and each of the factors (reg1 etc.) that should impact behaviour are varied systematically across the trials (and then we measure the behaviour).

So for the hierarchical model by itself, it seems quite straightforward in BRMS:

``````myFit = bro(formula=Behav ~ 1 + Reg1 + Reg2... + (1+Reg1+... | ID),
data=myData, family='bernoulli')
``````

And what I have done in the past is take each person’s measure of regression weight for reg 1 :

``````coef(myModel)\$ID
``````

And then include this in a new model with a new data frame:

b_Reg1 Psych1 Psych2 ID
1 1 1 1
0.4 4 25 2
-2 2 50 3
-0.2 3 2 4
``````myMod= brm(formula=b_Reg1~ 1+mo(Psych1)*Psych2),
data=myData2,family='gaussian')
``````

And what I find from this is that Psych1 or Psych2 don’t by themselves relate to b_Reg1, but their interaction does, so I think this is a ‘moderation effect’ (and then I’ve done some plots to understand it better).

And my question now is whether it would be a ‘neater’ model to directly include this moderation effect in the model of behaviour (‘myFit’ above), rather than running two models. And in my original post above I put 3 ways that I could imagine doing this. First I only thought that this would be the natural one to do it:

``````brm(Behav ~ 1+ reg1*mo(psych1)*psych2 + reg2 etc. (1+reg1+reg2...|ID)
``````

but then I saw this great introductory paper: https://jshd.pubs.asha.org/doi/abs/10.1044/2018_JSLHR-S-18-0006 ( @Ladislas) and got confused whether after all this was the right way to do it … as they seem to have something a bit similar, but end up with a format like this:

``````distance ~ gender + (1|subj) + (1|vowel),
``````

rather than

``````distance ~ 1+ gender + vowel + (1+vowel | subject)
``````

If you want to use the group-level coefficients of the participants in a regression model (as predictor or response), you are on the right track with your two-step approach, as brms cannot handle such a model in a one-step approach yet (the one-step approach is usually favorable though). One you could do additionally is provide the coefficients’ standard error to the second model, which you also get from the output of coef. Suppose that `b_Reg1` has standard error `SE_Reg1`. Then, if `b_Reg` should be a response in the second step you specify

``````b_Reg1 | mi(SE_Reg1) ~ ...
``````

If, instead, `b_Reg1` should be a predictor, you can most easily go for

``````... ~ me(b_Reg1, SE_Reg1) + ...
``````

Thank you so much Paul, including the standard error is a great idea! I was not aware of the syntax you’re saying at all…

Sorry if this is obvious already, could you clarify why this model is not right?:

``````brm(Behav ~ 1+ reg1*mo(psych1)*psych2 + reg2 etc. (1+reg1+reg2...|ID)
``````

In case this is useful to someone else, I’ve also just come across this paper that incorporates in one Stan (not brms) model:

I didn’t mean the other one is not right, sorry. I have actually overlooked it in my previous answer and just commented on how to do two-step approaches. The model you are citing seems reasonable to me indeed, but keep in mind that you need to think of what exactly you want to model as varying over subjects. For instance, right now, you seem to expect all the interaction terms to be constant across subjects, which may or may not be reasonable.

I see. That’s also what I was unsure about initially; given that the ‘psych’ regressors do not vary between participants, how could the interaction terms vary over subjects? So would you think that this model:

``````behaviour ~ 1+ reg1* mo(psych1) *psych2 + reg2...+ (1+reg1+reg2..|subj)
``````

or this more similar conceptually to the two-step approach:

``````behaviour ~ 1+ reg1* mo(psych1) *psych2 + reg2... (1+reg1* mo(psych1)*psych2+reg2...|subj)
``````

Their main effects can’t vary over subject but, since the values of the interaction terms are computed from at least one predictor whose values vary within subject, the values of the interaction term also vary within subject and, thus, their effect may vary over subjects.