The main goal of this study is to analyse the factors that influence the survival of oak seedlings.
We planted 900 acorns in 40 plots. The acorns were all collected from oak woodlands. Each planted acorn was considered healthy and viable during lab tests. Acorns were randomly assigned to each plot. We visited the plots periodically and recorded which acorns had germinated and followed the seedlings over time recording the time of death. We also collected several environmental covariates.
Out of 900 acorns, 400 never germinated. Low germination rates are common, but in this case the acorns that did not germinate seem to be concentrated in 5-7 plots. This seems to suggest that some environmental factors in these 5-7 plots may be reducing the probability of germination.
The initial goal of the study didn’t include assessing germination rates, but I think that we ignore it we will be introducing selection bias in our study. That is, I think we need to condition survival on germination, but I’m not sure how to proceed.
I guess the simplest option is to fit two separate models, a germination model and a survival model.
However, I wondering if it would make sense to stitch both models together like this:
Germination ~ Bernouli(pG)
pG ← logit(a+b1x1…+bn.xn)
Survival ~ exp(lambda * G)
If G = 0, then S = 0
Assuming the above model makes sense, what would we gain by fitting it like this compared to fitting two separate models?