I my original sentence reads in a way that’s much more certain than I intended it to be! But what I was trying to get at, and I would like someone to confirm or deny, is whether:

p(parameters | data, all N phylogenies) = the concatenation of (the draws from) p(parameters | data, phylogeny[i]) for i from 1:N

i.e. is the total posterior distribution the average of posterior distributions for each? I thought that it wasn’t, but it’s entirely likely I’m wrong about that. Either way, that’s the assumption you’re making when you run the model separately over each tree then concatenate them post-hoc.

E: And even if it is, presumably it assumes each individual model is mixing at exactly the same rate, so that a given number of draws of the posterior distribution for phylogeny[i] gives the same amount of independent information as a given number of draws of the posterior distribution for phylogeny[i+1]?

If all models are assumed equally likely(!), then I think we may very well merge all chains of all models into one single object. This is also the approach to multiple imputations done in brm_multiple.

Of course, the accuracy of that joint model depends on the accuracy of individual models. And samples of the joint model may be biased if the information provided in the samples of the individual models differs drastically. I haven’t seen a formal analysis of this though.

But what I was trying to get at, and I would like someone to confirm or deny, is whether:

p(parameters | data, all N phylogenies) = the concatenation of (the draws from) p(parameters | data, phylogeny[i]) for i from 1:N

Do you mean in general? Or what is the scope of this statement?

I guess the most general form of the question I’m trying to ask is this:

Assuming that we have some data with associated uncertainty that we can sample from, so we can get a distribution of likely values for that data. Is it always true that if you take each possible sample of the data and run a model on it separately, then merge the chains of those models, you would get the same final posterior distribution as if you had directly marginalised out the uncertainty by using all the samples in one model? (Let’s assume there isn’t anything weird going on with differential rates of mixing, and we’re treating each sampled value as equally likely.)

E: Thanks for de-confusing me, by the way. I’m not sure where I’d got into my head you couldn’t do that.

@paul.buerkner , do you think there is some way to use cov_ranef = list(phylo[1:100]) in brms_multiple to do this, or is this method restricted to multiple datasets in data =

Honestly, I don’t think that assumption is justified, at all. If one had a set of weights, representing the importance to be given to each tree, would it be possible for combine_models to accomodate that?

Presumably if you’re using a posterior distribution of trees from something like BEAST, each tree is “equally likely” in the important sense, given that they’re already sampled in proportion to their underlying probability? (Putting aside the issue that there are so many topologies that the majority will not have been sampled during the original analysis at all.)

I suppose so. Problem is that if you have a budget of 100 trees, it’ll be very hard for this sample to reflect the true posterior. You could still argue that your averaging procedure is sound because it reflects the phylogenetic uncertainty represented in that sample, I guess.

That’s definitely true, but I think it’s unavoidable. I think the phylogenetic uncertainty of the sample is the only reasonable solution unless you’re going to set up some crazy joint MCMC for the tree estimation and the downstream analysis (as you can do in revBayes, as I understand it). But if you’ve got a distribution of trees and you want to stick that into STAN/brms, it’s not clear to me where you’d get the weights for any kind of weighted analysis. Even if you’re subsampling a larger posterior distribution of trees for computational reasons, I’m not sure how I’d generate a sensible weighting those trees based on the whole posterior. Though, in fairness, that might just be because I’ve not thought about it much, maybe there’s an easy solution to weighting a subsample of the trees.

In order to get a consistent posterior you have to weight each tree topologies by marginal likelihoods conditioned on each topology.

Now for the bad news. To be polite these marginal likelihoods are extremely challenging to compute using information from only MCMC, such as samples generated conditional on each individual tree topology, and to be more accurate I will claim that we do not currently have the technologies to compute them accurately at all. On top of that we also don’t have the technologies to get a reasonably representative sample of the tree topologies themselves (no, neither revBayes nor BEAST accurately sample tree topology posteriors). These problems are extremely difficult and we’re just not there yet with regard to the necessary tools.

There’s good news to go with the bad news. Most of the biologists I’ve dealt with are more interested in more specific questions like “how many trees does this split occur in?” rather than the entire tree and interpretation tends to focus on specific splits that occur in “most” trees so these outputs tend to get read as “better than ML/max. parsimony”.

IDK if there are models that do approximate summations to avoid dealing with the combinatorial explosion in tree-space (that typically happens far away from the site where inference is wanted) but if not it’s a good idea and somebody should do it.

…

Trying to “average” these trees though… that’s terrifying in terms of interpretability.

Kind of. See this paper for an analysis based on the principle that clades (“splits”, difference is subtle but irrelevant here) are approximately independent given some regularity conditions. Others have shown this assumption to be dubious in certain settings.

While tracking subtrees is certainly a very popular way of mitigating superexponential growth of the parameter space, it is not without its own problems. It can be shown that any two subtrees are either positively or negatively correlated (i.e. can’t be uncorrelated), for instance. It is not clear to me how the partition structure of clades/subtrees/splits behaves in the posterior (there are results under some popular priors).

The main point of the comparative method, however, is to use the phylogenetic tree as the scaffold of the dependence structure in the data (“phylogenetic variance-covariance” matrix). I don’t know enough to see how one could do the same stuff considering only clades/subtrees/splits.

What do you think about the probabilistic path HMC stuff? It’s way over my head, but is that the kind of development you’re talking about?

I think, that’s commonly what’s their used for, yeah. “Is this clade supported in the majority of the trees in the sample?” If yes, interpret, if no, shrug and say that that part of the tree could not be well resolved. In molecular epidemiology, we’re very interested in using sequence data to track outbreaks for instance, and a poorly resolved part of the tree just tells us nothing about epidemiology at that stage. Comparative methods are a bit different though, in that the entire tree matters, because the implied variance-covariance matrix between the species depends on the full tree (i.e. the topology and the branch lengths), so you can’t just ignore the less well supported bits.

I’m not sure anyone’s talking about averaging trees, though. Well apart from the mathematicians looking at Frechet means of groups of trees in tree space in the theoretical literature, but I’ve never found a biologist who talks about averaging trees. Our tree summarisations are more for visualisation than interpretation, and, as you say, we’re generally looking at clade splits across the posterior. I think what we biologists are more interested is carrying forward the uncertainty in their phylogenetic hypothesis into downsteam analyses as far as is possible, using the logic that any single summarisation of the phylogeny is in all likelihood wrong.

I wasn’t suggesting ignoring. For a parallel example: a Stan model of a mixture of normals doesn’t ignore the assignment of data points to components but it does sum them out (no discrete parameters). The assignments can be simulated from the posterior after fitting without loss of information.

Yeah, that’s a little different because they’re changing the model. I was thinking of approximating the math rather than the model. I’ll see if I have some time to be more specific… :)

Sorry for being slow, but I don’t quite follow where this connects. I understand how Stan handles normal mixtures: the likelihood of each datapoint is the average of the likelihood of the datapoint under both normals weighted by the mixing proportions. And I understand how you can get assignments out at the end through simulation. I just can’t see what it’s analogous to.

But that presumes that the trees in the sample are representative. What typically happens is the sampler gets stuck in the neighborhood of the topology in which its initialized so the clades in the initial topology appears representative even when they’re not.

Accurately computing Frechet means is a necessary condition for quantifying the uncertainty of the phylogenetic distributions. The current theoretical work is very young but and often not really applicable but it establishes many of the tools one would need to understand how well one can quantify an entire distribution.

Because which topology is generated at the boundary when one of the edge lengths goes to 0 is random?

Standard practice in phylogenetics whereever I’ve done it is to initialise multiple runs (generally 4) from different random starting trees and only do any analysis when they’re sampling the same distribution, surely that alleviates the problem somewhat?

Correct. Sampling within a topology is straightforward, it’s sampling between the topologies that’s difficult and ends up obstructing everything.

Depends on how diffuse those initial random trees are. The effective priors over tree topologies are often much more diffuse than the initializations, which allows the corresponding chains to all get stuck in the same local basin.

If anyone does want to run a brms model over a lot of trees, despite the potential pitfalls discussed in this thread, here is the script that I used. I’m sure it could be made more elegant, but it works.
I hope this can help someone out.

In case folks are interested, HERE is a new manuscript that suggests that a large number of trees may not be required to incorporate all of the phylogenetic uncertainty into a model.

library(phytools)
library(MCMCglmm)
library(brms)
trees <- read.nexus("your_posterior_sample_of_trees.trees")
tree_samp <- sample(trees, size = 100)
data <- read.csv("your_data.csv", header=TRUE)
#Generate a list of covariance matrices
inverse<-list()
As <- list()
for (i in 1:length(tree_samp) ) {
inverse[[i]] <- inverseA(tree_samp[[i]], nodes = "TIPS", scale = TRUE)
x <- solve(inverse[[i]]$Ainv)
rownames(x) <- rownames(inverse[[i]]$Ainv)
As[[i]] <- x
}
# fit preliminary model
m.fit <- brm(
trait1 ~ trait2 + (1|phylo), data = data,
cov_ranef = list(phylo = As[[1]]),
iter = 5000
)
# Loop Model
m.fits <- vector("list", 100)
for (i in seq_along(m.fits)) {
m.fits[[i]] <- update(m.fit,
cov_ranef = list(phylo = As[[i]],
cores = 4)
)
}
### Combine using combine_models()
m.fits_comb <- combine_models(m.fits[[i]])