I’ve played around with this in my head now and I have a hard time grasping how to specify a model for this type of data.
Given data such as this,
id
measurement
value
trial
y
1
A
…
1
0
1
A
…
2
0
1
A
…
3
0
1
A
…
4
0
1
B
…
1
0
1
B
…
2
0
1
C
…
1
0
1
C
…
2
0
1
C
…
3
0
2
A
…
1
1
…
I would like to model the outcome y (1/0), i.e. have sickness or not.
id is a human subject, measurement is, e.g., blood pressure, pulse, etc., value depends on what measurement we use, and trial is simply in what temporal order the measurement, for measurement A,…,Z, was taken. The measurement can differ among subjects, as can the number of trials for each measurement.
First, I thought about (1 | measurement/trial/value) but that isn’t sane since I think we’d then use each unique \mathbb{R}value as a categorical value. Next, I thought that I’d use gp() and treat value as a varying intercept that way, but I don’t think it’ll fly since we’re talking about n>5e5.
From the way you describe the data, it seems to me that having different effects from different measurements is more important than nestedness.
maybe this is to simple, but how about
y ~ value + ( 0 + value | measurement) + (1|ID)
the main idea here is to just have random slopes for measurement.
(In which case I would make sure that the measurements are on the same scale (same ID) and that the expected direction of the effect is the same for all measurements (flip if necessary) because shrinkage otherwise works against you.)
This proposal neglects the trial variable. One could of course just add + trial, but I guess it depends the specifics of the problem if it is as easy as that (e.g., if this is repeated measurement over time, I’d rather put in days or week from first measurement on, or from another reasonable starting date.).
If there is good evidence to think that the effect of measurement depends on time, one could do
y ~ value + ( 0 + value | measurement:time) + (1|ID)