Model formulation of nested(?) model

I’ve played around with this in my head now and I have a hard time grasping how to specify a model for this type of data.

Given data such as this,

id measurement value trial y
1 A 1 0
1 A 2 0
1 A 3 0
1 A 4 0
1 B 1 0
1 B 2 0
1 C 1 0
1 C 2 0
1 C 3 0
2 A 1 1

I would like to model the outcome y (1/0), i.e. have sickness or not.

id is a human subject, measurement is, e.g., blood pressure, pulse, etc., value depends on what measurement we use, and trial is simply in what temporal order the measurement, for measurement A,…,Z, was taken. The measurement can differ among subjects, as can the number of trials for each measurement.

First, I thought about (1 | measurement/trial/value) but that isn’t sane since I think we’d then use each unique \mathbb{R} value as a categorical value. Next, I thought that I’d use gp() and treat value as a varying intercept that way, but I don’t think it’ll fly since we’re talking about n>5e5.

@Guido_Biele or @paul.buerkner should know, but I’d appreciate anyone’s input! :)

From the way you describe the data, it seems to me that having different effects from different measurements is more important than nestedness.

maybe this is to simple, but how about

y ~ value + ( 0 + value | measurement) + (1|ID)

the main idea here is to just have random slopes for measurement.
(In which case I would make sure that the measurements are on the same scale (same ID) and that the expected direction of the effect is the same for all measurements (flip if necessary) because shrinkage otherwise works against you.)

This proposal neglects the trial variable. One could of course just add + trial, but I guess it depends the specifics of the problem if it is as easy as that (e.g., if this is repeated measurement over time, I’d rather put in days or week from first measurement on, or from another reasonable starting date.).

If there is good evidence to think that the effect of measurement depends on time, one could do

y ~ value + ( 0 + value | measurement:time) + (1|ID)

Thanks Guido, and yes, I guess that is the most straightforward approach.

But, I can get timestamps for trial, so I guess I must try the second approach (which was actually more along the lines of what I wanted :)

Much appreciated!

I think it is still useful to try if time can be chunked into coarser bins to get a trial variable that does not have all to many levels.

Alternatively, one could add additional structure by regressing the effect of measurement on time (OK, its just an interaction ;-)), something like

y ~ value*time + (0 + value*time | measurement) + (1|ID)

1 Like