I’ve played around with this in my head now and I have a hard time grasping how to specify a model for this type of data.
Given data such as this,
id | measurement | value | trial | y |
---|---|---|---|---|
1 | A | … | 1 | 0 |
1 | A | … | 2 | 0 |
1 | A | … | 3 | 0 |
1 | A | … | 4 | 0 |
1 | B | … | 1 | 0 |
1 | B | … | 2 | 0 |
1 | C | … | 1 | 0 |
1 | C | … | 2 | 0 |
1 | C | … | 3 | 0 |
2 | A | … | 1 | 1 |
…
I would like to model the outcome y (1/0), i.e. have sickness or not.
id
is a human subject, measurement
is, e.g., blood pressure, pulse, etc., value
depends on what measurement
we use, and trial
is simply in what temporal order the measurement, for measurement A,…,Z, was taken. The measurement
can differ among subjects, as can the number of trial
s for each measurement
.
First, I thought about (1 | measurement/trial/value)
but that isn’t sane since I think we’d then use each unique \mathbb{R} value
as a categorical value. Next, I thought that I’d use gp()
and treat value
as a varying intercept that way, but I don’t think it’ll fly since we’re talking about n>5e5.
@Guido_Biele or @paul.buerkner should know, but I’d appreciate anyone’s input! :)