Sampling over many phylogenetic trees in brms

Sorry for being slow, but I don’t quite follow where this connects. I understand how Stan handles normal mixtures: the likelihood of each datapoint is the average of the likelihood of the datapoint under both normals weighted by the mixing proportions. And I understand how you can get assignments out at the end through simulation. I just can’t see what it’s analogous to.

No. PP HMC doesn’t really handle the topology.

But that presumes that the trees in the sample are representative. What typically happens is the sampler gets stuck in the neighborhood of the topology in which its initialized so the clades in the initial topology appears representative even when they’re not.

Accurately computing Frechet means is a necessary condition for quantifying the uncertainty of the phylogenetic distributions. The current theoretical work is very young but and often not really applicable but it establishes many of the tools one would need to understand how well one can quantify an entire distribution.

Because which topology is generated at the boundary when one of the edge lengths goes to 0 is random?

Standard practice in phylogenetics whereever I’ve done it is to initialise multiple runs (generally 4) from different random starting trees and only do any analysis when they’re sampling the same distribution, surely that alleviates the problem somewhat?

Correct. Sampling within a topology is straightforward, it’s sampling between the topologies that’s difficult and ends up obstructing everything.

Depends on how diffuse those initial random trees are. The effective priors over tree topologies are often much more diffuse than the initializations, which allows the corresponding chains to all get stuck in the same local basin.

3 Likes

This was an excellent discussion, thanks!

If anyone does want to run a brms model over a lot of trees, despite the potential pitfalls discussed in this thread, here is the script that I used. I’m sure it could be made more elegant, but it works.
I hope this can help someone out.

In case folks are interested, HERE is a new manuscript that suggests that a large number of trees may not be required to incorporate all of the phylogenetic uncertainty into a model.

library(phytools)
library(MCMCglmm)
library(brms)

trees <- read.nexus("your_posterior_sample_of_trees.trees")
tree_samp <- sample(trees, size = 100)
data <- read.csv("your_data.csv", header=TRUE)

#Generate a list of covariance matrices
inverse<-list()
As <- list()
for (i in 1:length(tree_samp) ) {
  inverse[[i]] <- inverseA(tree_samp[[i]], nodes = "TIPS", scale = TRUE)
  x <- solve(inverse[[i]]$Ainv)
  rownames(x) <- rownames(inverse[[i]]$Ainv)
  As[[i]] <- x
}

# fit preliminary model
m.fit <- brm(
  trait1 ~ trait2 + (1|phylo), data = data, 
  cov_ranef = list(phylo = As[[1]]),
  iter = 5000
)

# Loop Model

m.fits <- vector("list", 100) 
for (i in seq_along(m.fits)) {
m.fits[[i]] <- update(m.fit,
                      cov_ranef = list(phylo = As[[i]],
                      cores = 4)
)
}

### Combine using combine_models()
m.fits_comb <- combine_models(m.fits[[i]])
3 Likes