Dear stan/brms users,
I am a new modeler (having only been learning about brms for about a week), so please forgive my ignorance of many aspects of the question I’d like to ask. I work in the field of phylogenetic systematics, and I am trying to come up with a way to model the probability of a set of events through time, and assess the likely time interval in which the events occurred. The data are phylogenetically structured and a mix of contemporaneous and non-contemporaneous (i.e. not all of the tips of the tree line up at t=0).
The outcome response data is coded as binary 0/1 (the occurrence of an event, or the absence of evidence of an event), and are associated with each terminal in the phylogenetic tree, which has branch lengths in units of time. It appears that these “independent” (though phylogenetically structured) events are clustered in time, and I would like to assess that hypothesis quantitatively. I will refer to the event of interest as the “outcome event” because this is how I have been treating it, but this may be wrong (see below).
I reviewed the phylogenetic brms vignette Estimating Phylogenetic Multilevel Models with brms and this gives me some hope that brms/stan may provide a suitable way to explore these topics. I have been experimenting and going through some vignettes to become generally familiar with how brms works (though I am still very much a beginner!!). I have been able to get something akin to phylogenetic logistic regression (but bayesian, which is extremely interesting and potentially important its own right) to work, which I think looks something like this:
brm(uncex.merged ~ log.stem.ages_since + (1|gr(phylo, cov = A)), data2 = list(A = A), data = logisticreg.newdat.LHTs.bm, family = bernoulli(), cores=3, control = list(adapt_delta = 0.99))
In this simple model, I have coded the responses variable uncex.merged as 0/1 (as noted above). For the independent variable stem.ages_since I have coded the time elapsed since a particular geological event has occurred. This model allows me to assess the probability of the outcome event as a function of the distance (time) to a geological event of interest. In my case, the probability of the event increases toward the time of the geological event, which is pretty cool.
This is useful and interesting, but not exactly what I’m looking for, because it assumes knowledge of a particular geological event that may be related to the outcome event I’m trying to model, and it may be better to assess that hypothesis naively.
I am not sure if I should be modeling the outcome event as a function of time, or time as some function of the outcome event. Given that I want to assess WHEN the event(s) occurred (and associated uncertainty, given the input data), perhaps I am thinking about this backwards… I did find some information on performing survival analysis in BRMS, but I am not sure that is appropriate in this case (though I admit I am not familiar with that kind of analysis at all).
Either way, I would very much appreciate any guidance or insight into what questions I may need to answer in order to better articulate this hypothesis as a brms model. It would be great to be able to assess both 1) when the events of interest occurred, and 2) their probability of occurrence through time. Alternatively, because I am interested in the proximity of the events of interest to a known geological event, perhaps another way to ask this question would be, what is the posterior probability that the event of interest occurred within some proximity of the known geological event – but these seem like two different models. What I have outlined above is of primary interest, but it would also be great to be able to assess these hypotheses with respect to additional covariates I have not mentioned.
Thanks for your time,
Jacob Berv