Sampling over many phylogenetic trees in brms

dpascall · October 9, 2018, 6:09pm

sakrejda:

dpascall:

sakrejda:

There’s good news to go with the bad news. Most of the biologists I’ve dealt with are more interested in more specific questions like “how many trees does this split occur in?” rather than the entire tree and interpretation tends to focus on specific splits that occur in “most” trees so these outputs tend to get read as “better than ML/max. parsimony”.

The entire tree matters, because the implied variance-covariance matrix between the species depends on the full tree (i.e. the topology and the branch lengths), so you can’t just ignore the less well supported bits

I wasn’t suggesting ignoring. For a parallel example: a Stan model of a mixture of normals doesn’t ignore the assignment of data points to components but it does sum them out (no discrete parameters). The assignments can be simulated from the posterior after fitting without loss of information.

Sorry for being slow, but I don’t quite follow where this connects. I understand how Stan handles normal mixtures: the likelihood of each datapoint is the average of the likelihood of the datapoint under both normals weighted by the mixing proportions. And I understand how you can get assignments out at the end through simulation. I just can’t see what it’s analogous to.

betanalpha · October 10, 2018, 2:04am

No. PP HMC doesn’t really handle the topology.

But that presumes that the trees in the sample are representative. What typically happens is the sampler gets stuck in the neighborhood of the topology in which its initialized so the clades in the initial topology appears representative even when they’re not.

Accurately computing Frechet means is a necessary condition for quantifying the uncertainty of the phylogenetic distributions. The current theoretical work is very young but and often not really applicable but it establishes many of the tools one would need to understand how well one can quantify an entire distribution.

dpascall · October 12, 2018, 11:21am

Because which topology is generated at the boundary when one of the edge lengths goes to 0 is random?

Standard practice in phylogenetics whereever I’ve done it is to initialise multiple runs (generally 4) from different random starting trees and only do any analysis when they’re sampling the same distribution, surely that alleviates the problem somewhat?

betanalpha · October 18, 2018, 1:24pm

Correct. Sampling within a topology is straightforward, it’s sampling between the topologies that’s difficult and ends up obstructing everything.

Depends on how diffuse those initial random trees are. The effective priors over tree topologies are often much more diffuse than the initializations, which allows the corresponding chains to all get stuck in the same local basin.

jonnations · January 24, 2019, 2:04am

This was an excellent discussion, thanks!

If anyone does want to run a brms model over a lot of trees, despite the potential pitfalls discussed in this thread, here is the script that I used. I’m sure it could be made more elegant, but it works.
I hope this can help someone out.

In case folks are interested, HERE is a new manuscript that suggests that a large number of trees may not be required to incorporate all of the phylogenetic uncertainty into a model.

library(phytools)
library(MCMCglmm)
library(brms)

trees <- read.nexus("your_posterior_sample_of_trees.trees")
tree_samp <- sample(trees, size = 100)
data <- read.csv("your_data.csv", header=TRUE)

#Generate a list of covariance matrices
inverse<-list()
As <- list()
for (i in 1:length(tree_samp) ) {
  inverse[[i]] <- inverseA(tree_samp[[i]], nodes = "TIPS", scale = TRUE)
  x <- solve(inverse[[i]]$Ainv)
  rownames(x) <- rownames(inverse[[i]]$Ainv)
  As[[i]] <- x
}

# fit preliminary model
m.fit <- brm(
  trait1 ~ trait2 + (1|phylo), data = data, 
  cov_ranef = list(phylo = As[[1]]),
  iter = 5000
)

# Loop Model

m.fits <- vector("list", 100) 
for (i in seq_along(m.fits)) {
m.fits[[i]] <- update(m.fit,
                      cov_ranef = list(phylo = As[[i]],
                      cores = 4)
)
}

### Combine using combine_models()
m.fits_comb <- combine_models(m.fits[[i]])

Topic		Replies	Views
Providing a prior distribution of trees to a brms model brms phylogenetic	7	1142	December 15, 2020
Phylogenetic models with unseen categories brms	2	518	December 23, 2019
Multivariate phylogenetic with repeated measurements model help brms fitting-issues , loo , biology	8	1757	April 21, 2020
Brms paper published Publicity	2	1243	September 23, 2017
Rescaling phylogenetic tree for non-Brownian phylogenetic models in brms? Modeling brms	14	919	January 25, 2023

Sampling over many phylogenetic trees in brms

Related topics