Let’s say I have a binary outcome
, numerous data rows per group
for each of several timepoints
, a predictor p_prop
that is a proportion of samples samps
that are positive. I have no information at the samps
level, just the number taken for each value of p_prop
in the row of data.
Differing numbers of samples are taken for each timepoint for each group. I am interested in how p_prop
and timepoints
predict the outcome
.
I might run a model in brms
like this:
m1 <- brm(outcome ~ 1 + timepoints + p_prop + (1|group), family=bernoulli, data=data)
However, because I have vastly differing numbers of samples for each timepoint for each group, I would think that I want some sort of measurement error model, because the p_prop
value for 5 samples should have more error than a group with say 400 samples. Since I have the number of samples and the proportion, I could calculate the sd and use that in a measurement error model like this:
m1.me <- brm(outcome ~ 1 + timepoints + me(p_prop, p_prop_sd) + (1|group), family=bernoulli, data=data)
But I’m not sure this is what I want (and brms throws an error), because I have a lot of p_prop
values that are 0 and 1, and the sd of a proportion that is 0 or 1 is 0. But it seems that I would want the uncertainty in the value of the predictor for p_prop
to scale with the number of samples taken…right? How do I go about doing this? Maybe I’m not thinking about this the correct way, but despite the formula for sd of a proportion, at least in my scenario, I trust the value from 5 samples much less than I do from 400 samples, despite the value of the proportion itself.
Thanks!