Distribution of the rooted tree

ellie.nou13 · July 3, 2017, 5:04pm

Hi guys,

Is there any way to model the distribution of the rooted tree (graph topology view)? I have a known matrix N that is a multiplication of two unknown matrix U and B (N=U*B). I know each column of U is sum to one (simplex vector) and the B matrix can have only two integer value zero and one and is equivalent to the rooted tree. Any suggestion? Thank you in advance.

Bob_Carpenter · July 4, 2017, 1:08am

Stan doesn’t support integer parameters and the marginalization over the values in B would be intractable. If you do find something that says it fits this kind of thing, I’d be careful to examine the multimodality and the ability to recover parameters from fake data.

@betanalpha has been thinking a bit more about tree topologies, but they’re going to require specialized samplers.

betanalpha · July 4, 2017, 3:12am

And new math. We’re nowhere close to understanding how we might approach anything like this, let alone implementing it efficiently.

aaronjg · July 4, 2017, 7:05am

I’m not sure what your application is, but that sounds similar to the phylogeny inference problem. If that is what you are doing, I’m sure you are familiar with the relevant literature. However, if you are coming at it from another domain, you might want to check out something like BayesTraits.

betanalpha · July 4, 2017, 12:57pm

All existing Bayesian phylogenetic algorithms are based on either heuristic discrete MCMC moves or heuristic random jump MCMC moves (ultimately equivalent) that yield only local exploration of the possible tree topologies. From what I and some of my colleagues can identify the resulting exploration is poor and the resulting chains end up exploring just the neighborhood around the initial point, akin to how Random Walk Metropolis behaves in high dimensions. In other words, you get something slightly better than a point estimate but nothing close to an accurate quantification of uncertainty in the phylogenetic space.

As I implied above this is an active research topic that multiple groups around the world on pursuing, but we’re missing some serious conceptual gaps in the mathematics that would allow us to build something generic enough to be applicable to applied problems.

For example, why does dinosaur ancestry keep changing every few years? Because the inferred phylogenies are based on fragile point estimates and the inclusion of any new data makes drastic changes to the inferred topology. Were we able to accurately quantify the entire posterior over phylogenies then the inferences would be much more stable across time as new measurements are included. Even if you don’t care about the science, at least care about destroying the favorite dinosaurs of young children. Won’t somebody think of the children?

ellie.nou13 · July 6, 2017, 2:55am

Hi,

I was wondering if you know any application of Stan for phylogenies?
Do you have any suggestion about how I can sample from a tree?

Thank you.

betanalpha · July 6, 2017, 3:06am

Unfortunately, no. Conditioned on the tree topology you just have a bunch of edge weights which can be fit with Stan reasonably well, but the actually tree topologies themselves are difficult to sample from.

Because trees are relatively simple graphs, however, one might be able to get away by assuming a fully connected tree topology and then try to infer all of the weights. Or the weights may have additional structure to match the nested structure of the tree – for example, the weight of a node is the product of the weights of its parents.

But at no point would you be able to infer that a given edge is “zero”. Instead of asking whether an edge exists you would instead have limit yourself to questions of how strong an edge is.

Topic		Replies	Views
Sampling over an uncertain variance covariance matrix Modeling	10	2595	August 2, 2018
Blog posts on ecological modeling in Stan Modeling	6	246	February 11, 2025
Integer parameters Modeling	4	2748	July 3, 2017
Implementing phylogenetic logistic regression for binary response variable in Stan Modeling techniques , specification	5	1249	March 27, 2019
Phylogenetic regression in Stan: can this model be improved? Modeling	5	113	October 3, 2024

Distribution of the rooted tree

Related topics