Design formula syntax


#1

Hi all,

I thought I’d post it here so that more people can benefit from Paul’s answer, which I’m sure will be worthwhile to read. Me, colleagues, and friends have one question:

What is the best references to understand better how the syntax for design formulas can be used?

I’ve been using Stan and Rethinking before, but recently switched to brms. One thing I stumble upon often is how to express my models in the lme4 syntax; especially for group effects (yes, I know about response | aterms ~ pterms + (gterms | group), but there’s more.

Table 2 in [https://arxiv.org/pdf/1406.5823.pdf](Fitting linear mixed-effects models using lme4) was close to what I needed but not complete. I’ve read lots of text now, and still, I can’t seem to see the difference between several variants, e.g.,

formula = tp ~ 1 + technique + category + (1 | subject)
subject is a group effect (random) w/ fixed mean

formula = tp ~ 1 + technique + (1 + category | subject)
?!?

formula = tp ~ 1 + technique + category + (1 + category | subject)
Same as previous but here we also look at category as population effect?

where technique is A and B, category is A and B, and subject is an individual (1…35) who has used the two techniques and is categorized according to a category (low and high experience). (Yes, learning bias was accounted for in the design.)

Any input, pointers, tips would be much appreciated!


#2

formula = tp ~ 1 + technique + category + (1 | subject) assumes the intercept (1) to vary across subjects but the effect of technique and categegory to be constant across subjects.

formula = tp ~ 1 + technique + (1 + category | subject) makes no sense since category does not appear in the “pterms” part, but in the “gterms” part.

formula = tp ~ 1 + technique + category + (1 + category | subject) assumes both the intercept and the effect of category to vary across subjects, while technique is assumed to have a constant effect across subjects.


#3

Big thank you. But no pointers to a text where they discuss this syntax in detail (similar to the pdf I point to in my post, but more extensive)?


#4

You may want to take a look also at the two brms papers, but the lme4 paper is already pretty detailed about the syntax.

https://www.jstatsoft.org/article/view/v080i01

https://arxiv.org/abs/1705.11123


#5

Thanks, I’ve read those two (good btw) :)


#6

Hmm I see. Unfortunately, I don’t know of any better reference than the lme4 JSS paper at the moment. Maybe there are some nice blog post out there about this syntax but I am currently not aware of any.


#7

I personally found this site quite helpful:

It focusses on longitudinal models, but gives good impressions nonetheless…


#8

I get it… (1 | subject) is varying intercepts, while (1 + category | subject) is varying intercept and slopes. A bit more shorthand to write it this way compared to what I’ve been getting used to ;)


#9

You can also think of it as:

  • right side of the bar: variable(s) that describe the grouping / hierarchical structure in your data

  • left side of the bar: what varies across this / these group(s)?

    • 1: varying intercept
    • 1 + <variable>: varying slope and intercept
    • 0 + <variable>: no varying intercept, only varying slope

If your data is “nested” (classical example: classes in schools), you would write this as
(1 | school / class) or (1 + <variable> | school / class).