I’ve played around with this in my head now and I have a hard time grasping how to specify a model for this type of data.

Given data such as this,

id | measurement | value | trial | y |
---|---|---|---|---|

1 | A | … | 1 | 0 |

1 | A | … | 2 | 0 |

1 | A | … | 3 | 0 |

1 | A | … | 4 | 0 |

1 | B | … | 1 | 0 |

1 | B | … | 2 | 0 |

1 | C | … | 1 | 0 |

1 | C | … | 2 | 0 |

1 | C | … | 3 | 0 |

2 | A | … | 1 | 1 |

…

I would like to model the outcome y (1/0), i.e. have sickness or not.

`id`

is a human subject, `measurement`

is, e.g., blood pressure, pulse, etc., `value`

depends on what `measurement`

we use, and `trial`

is simply in what temporal order the measurement, for measurement A,…,Z, was taken. The `measurement`

can differ among subjects, as can the number of `trial`

s for each `measurement`

.

First, I thought about `(1 | measurement/trial/value)`

but that isn’t sane since I think we’d then use each unique \mathbb{R} `value`

as a categorical value. Next, I thought that I’d use `gp()`

and treat `value`

as a varying intercept that way, but I don’t think it’ll fly since we’re talking about n>5e5.

@Guido_Biele or @paul.buerkner should know, but I’d appreciate anyone’s input! :)