I recently noticed that the vignette for phylogenetic models now advises to use the covariance matrix, whereas an older version of the vignette used the correlation matrix to fit the phylogenetic models. In the specific example used, it doesn’t really matter if we would use the covariance matrix or the correlation matrix, because they are almost identical in this specific tree, as demonstrated if you run the code below. I think this is related to the branch lengths.
# Load example tree phylo <- read.nexus("https://raw.githubusercontent.com/MPCMEvolution/MPCMArchive/master/data_files/ch11/phylo.nex") a1 <- ape::vcv(phylo, corr = TRUE) # vcv matrix a2 <- ape::vcv(phylo, corr = FALSE) # correlation matrix ggplot() + geom_abline(slope = 1, color = "blue") + geom_point(aes(a1, a2))
From my experience with other trees (where the branch lengths are higher and the covariance matrix is quite different from the correlation matrix), I tend to get a very low phylogenetic signal when I am using the covariance matrix, whereas I get substantially higher phylogenetic signals when using the correlation matrix.
My question is: it is justifiable to use the correlation matrix or to rescale the tree to maximum branch lengths of one? This seems like a crucial point that may impact conclusions of anyone fitting phylogenetic models, but I don’t quite understand why the differences are so big.
Thanks for any thoughts on this.
Below, a plot showing an example of the differences in phylogenetic signal when fitting a model with the covariance matrix of the unscaled tree (red), the covariance matrix of a scaled tree (blue), and the correlation matrix (yellow).
Below the code to reproduce this example.
##### Fish phylogenetics example ##### # Load packages library(tidyverse) library(fishtree) library(rfishbase) library(janitor) library(brms) # Get species list in the family Labridae sp <- species_list(Family = "Labridae") # Get data on maximum size of fishes length <- species(sp, "Length") %>% mutate(species = sp, phylo = gsub(" ","_", sp)) # Get phylogenetic tree tree <- fishtree_phylogeny(sp) # filter and clean data for missing species in tree lengths <- length %>% dplyr::filter(species %in% gsub("_", " ", tree$tip.label)) %>% janitor::clean_names() %>% drop_na() # get covariance matrix tree_vcv <- ape::vcv(tree, corr = FALSE) # get scaled covariance matrix scaled_tree <- tree scaled_tree$edge.length <- scaled_tree$edge.length / (max(tree$edge.length)) tree_vcv_scaled <- ape::vcv(scaled_tree, corr = FALSE) # get correlation matrix tree_cor <- ape::vcv(tree, corr = TRUE) # fit models fit1 <- brms::brm(length ~ (1 | gr(phylo, cov = tree_vcv)) , data = lengths, family = gaussian(), data2 = list(tree_vcv = tree_vcv), chains = 4, cores = 4, iter = 4000, warmup = 2000, control = list(adapt_delta = 0.99, max_treedepth = 12)) fit2 <- update(fit1, data2 = list(tree_vcv = tree_vcv_scaled), newdata = lengths) fit3 <- update(fit1, data2 = list(tree_vcv = tree_cor), newdata = lengths) # phylogenetic signals hyp <- "sd_phylo__Intercept^2 / (sd_phylo__Intercept^2 + sigma^2) = 0" h1 <- hypothesis(fit1, hyp, class = NULL) h2 <- hypothesis(fit2, hyp, class = NULL) h3 <- hypothesis(fit3, hyp, class = NULL) #plot ggplot() + geom_histogram(aes(x = h1$samples$H1, fill = "non_scaled"), fill = "red", alpha = 0.5) + geom_histogram(aes(x = h2$samples$H1, fill = "scaled"), alpha = 0.5) + geom_histogram(aes(x = h3$samples$H1, fill = "cor"), alpha = 0.5) + theme_bw() + scale_fill_manual(name = "", values = c( non_scaled = "red", scaled = "blue", cor = "yellow")) + labs(x = "Phylogenetic signal")
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS