Hi!
I recently noticed that the vignette for phylogenetic models now advises to use the covariance matrix, whereas an older version of the vignette used the correlation matrix to fit the phylogenetic models. In the specific example used, it doesn’t really matter if we would use the covariance matrix or the correlation matrix, because they are almost identical in this specific tree, as demonstrated if you run the code below. I think this is related to the branch lengths.
# Load example tree
phylo <- read.nexus("https://raw.githubusercontent.com/MPCMEvolution/MPCMArchive/master/data_files/ch11/phylo.nex")
a1 <- ape::vcv(phylo, corr = TRUE) # vcv matrix
a2 <- ape::vcv(phylo, corr = FALSE) # correlation matrix
ggplot() +
geom_abline(slope = 1, color = "blue") +
geom_point(aes(a1, a2))
From my experience with other trees (where the branch lengths are higher and the covariance matrix is quite different from the correlation matrix), I tend to get a very low phylogenetic signal when I am using the covariance matrix, whereas I get substantially higher phylogenetic signals when using the correlation matrix.
My question is: it is justifiable to use the correlation matrix or to rescale the tree to maximum branch lengths of one? This seems like a crucial point that may impact conclusions of anyone fitting phylogenetic models, but I don’t quite understand why the differences are so big.
Thanks for any thoughts on this.
Below, a plot showing an example of the differences in phylogenetic signal when fitting a model with the covariance matrix of the unscaled tree (red), the covariance matrix of a scaled tree (blue), and the correlation matrix (yellow).
Below the code to reproduce this example.
##### Fish phylogenetics example #####
# Load packages
library(tidyverse)
library(fishtree)
library(rfishbase)
library(janitor)
library(brms)
# Get species list in the family Labridae
sp <- species_list(Family = "Labridae")
# Get data on maximum size of fishes
length <- species(sp, "Length") %>%
mutate(species = sp, phylo = gsub(" ","_", sp))
# Get phylogenetic tree
tree <- fishtree_phylogeny(sp)
# filter and clean data for missing species in tree
lengths <- length %>%
dplyr::filter(species %in% gsub("_", " ", tree$tip.label)) %>%
janitor::clean_names() %>%
drop_na()
# get covariance matrix
tree_vcv <- ape::vcv(tree, corr = FALSE)
# get scaled covariance matrix
scaled_tree <- tree
scaled_tree$edge.length <- scaled_tree$edge.length / (max(tree$edge.length))
tree_vcv_scaled <- ape::vcv(scaled_tree, corr = FALSE)
# get correlation matrix
tree_cor <- ape::vcv(tree, corr = TRUE)
# fit models
fit1 <- brms::brm(length ~ (1 | gr(phylo, cov = tree_vcv)) ,
data = lengths, family = gaussian(),
data2 = list(tree_vcv = tree_vcv),
chains = 4, cores = 4, iter = 4000,
warmup = 2000, control = list(adapt_delta = 0.99,
max_treedepth = 12))
fit2 <- update(fit1, data2 = list(tree_vcv = tree_vcv_scaled),
newdata = lengths)
fit3 <- update(fit1, data2 = list(tree_vcv = tree_cor),
newdata = lengths)
# phylogenetic signals
hyp <- "sd_phylo__Intercept^2 / (sd_phylo__Intercept^2 + sigma^2) = 0"
h1 <- hypothesis(fit1, hyp, class = NULL)
h2 <- hypothesis(fit2, hyp, class = NULL)
h3 <- hypothesis(fit3, hyp, class = NULL)
#plot
ggplot() +
geom_histogram(aes(x = h1$samples$H1, fill = "non_scaled"),
fill = "red", alpha = 0.5) +
geom_histogram(aes(x = h2$samples$H1, fill = "scaled"), alpha = 0.5) +
geom_histogram(aes(x = h3$samples$H1, fill = "cor"), alpha = 0.5) +
theme_bw() +
scale_fill_manual(name = "", values = c(
non_scaled = "red",
scaled = "blue",
cor = "yellow")) +
labs(x = "Phylogenetic signal")
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS
brms_2.13.5