Hi Forum,
I am working on a manuscript that includes some phylogenetic multilevel models. Some co-authors suggested that I sample over a number of phylogenetic trees from the posterior of a MCMC analysis (BEAST in this case, but other Bayesian programs do the same thing). As the trees are not all the same, this captures the uncertainty of the phylogenetic estimation in the subsequent model.
Ideally I would run brms over multiple trees and then merge the results (using thinning or not) into a brmsfit
object so that other functions such as pp_check()
and marginal_effects()
will work.
I saw the previous forum post HERE on doing the same thing in STAN, but I would ideally like to stick to brms for now. Additionally, after using several glmm programs that work with phylogenies, I find that brms is by far the easiest to use, has the best built-in functions, and the best support. I think that it will really catch on with the phylo-biologist crowd in the next year or two. As the multiple-tree method becomes more and more expected of authors, having a clear way to run this will benefit everyone.
I am a (novice) R user and not a programmer, so I wanted to ask for some advice on this. There are a couple of approaches that I think might work, but I’m sure there are much more elegant options.
-One way would be to loop over all the sampled trees. For 100 trees this would produce 100 brmsfit objects that would need to be merged (and probably thinned).
-Second thought would be to use one of R’s vector functions like sapply()
or something to avoid the for-loop and perhaps gain some speed?
-I suppose another option would be a way to refit the previous model with a different tree using update()
but somehow continuously updating and adding to the previous model
Any suggestions on how to proceed with this would be very helpful!
@Zacco @maxbiostat @dpascall I am tagging you three in here as well as you all have commented on phylogenetic methods in stan/brms before, and I thought you would have some interest in this topic, and perhaps some suggestions.
Thanks!
I am fitting pretty standard brms phylo models, nothing too complicated:
fit <- brm(
State ~ PHENO + (1|phylo) + (1|Species), data = data,
family = cumulative(link = "probit"), cov_ranef = list(phylo = A),
iter = 5000,
save_all_pars = TRUE,
prior = c(
set_prior("normal(0,50)", class = "b"),
set_prior("normal(0,50)", class = "Intercept"),
set_prior("normal(0,1.5)", class = "sd", coef = "Intercept", group = "Species"),
set_prior("normal(0,1.5)", class = "sd", coef = "Intercept", group = "phylo")),
control = list(adapt_delta = 0.99, max_treedepth = 15),
cores = parallel::detectCores(),
file = "20181005_tail_fit.Rds"
)
A way to run this in parallel on a cluster could also save a lot of time (for 100 trees, i estimate a runtime of 25 hours on my machine).
Please also provide the following information in addition to your question:
- Operating System: OSX
- brms Version: 2.4.4