Categorical response phylogenetic model brms

I am experimenting with a phylogenetic regression using brms with a categorical response. For now I have no covariates, just the phylogeny.

A <- ape::vcv.phylo(phylo)


options(mc.cores = parallel::detectCores()) 

model <- brm(System ~ 1 + (1|gr(phylo, cov = A)), data = data,   data2 = list(A = A),
  family = categorical(),iter=6000,
  control=list(adapt_delta=0.99999, max_treedepth = 11))

I have two questions. First is it even appropriate to fit a model like this, or are there better practices that are recommended? Second, it is my understanding that the inter class correlation will give me something analogous to phylogenetic signal. If that’s true, would an appropriate way of estimating that given a categorical model family be something like this?

PPD_ranef <- posterior_predict(model, summary = F)
PPD_noraneff<- posterior_predict(model, re_formula = NA, summary = F)
var_ranef  <- apply(PPD_ranef , 1, var, na.rm=T)
var_noraneff <- apply(PPD_noraneff, 1, var, na.rm=T)
ICC<- var_noraneff / var_ranef  
quantile(ICC, probs = c(.025, .5, .975))
mean(ICC,na.rm=TRUE)

Any feedback is much appreciated!

2 Likes

This is interesting! For your first question, I think that your model makes sense. In a recent project I used some very similar models as “null” models, then ran loo to compare these with other models that had predictor variables. The idea was to see if phylogeny alone better explained the data than the predictors + phylogeny. I was using a gaussian family rather than categorical, but the idea should be the same.

I have been wondering about your second question for a while. Normally phylogenetic signal is sd_phylo__Intercept^2 / (sd_phylo__Intercept^2 + sigma^2) but the categorical family doesn’t have a variance estimate. I think the best thing to do would be to simulate data with a known lambda value and then test this method. Phytools has tools to simulate categorical data I believe (fastBM and sim.history). But I’m not sure the phylogenetic signal / ICC is the difference between the variance of the phylo model and the non-phylo model? Alternatively, in this case maybe the sd_phylo__Intercept from your model can be directly interpreted as the phylogenetic signal.

2 Likes

@jonnations is right that variance for categorical model is meaningless (and thanks for looking into this question, it is appreciated!). To expand a bit: the responses are coded as integers, so R will happily compute the variance, but the number does not make sense, because by reordering the levels (which does not matter in the real world), you can get quite different variances:

var(c(1,1,1,1,1,2,2,3,3,3,3,3,3,3,3))
#[1] 0.8857143
var(c(1,1,1,1,1,3,3,2,2,2,2,2,2,2,2))
#[1] 0.4571429
> 

Some stuff that could make sense (but really, you need to think if it is appropriate for the scientific question you are asking):

Definitely - for some types of questions, interpreting the sd could be directly relevant (although interpretation of parameters of multinomial models is a bit tricky).

Additionally:

  • You can interpret categorical regression as a set of pairwise binary comparisons. I.e. you can compute the odds between each pair of categories. You can then compute ICC for each logistic regression either as you do, by computing the variance directly, or via the analytic formula outlined for example here: Calculating Intra-Cluster Correlation (ICC) for the Group-Level Effect in a Multinomial GLMM (the topic discuss problem similar to yours, but there actually isn’t a solution, but it starts with the binary case)
  • While variance is meaningless for a multinomial distribution, there are other quantities that are meaningful, most notably entropy. So you could compute how much entropy is accounted for by the phylogenetic signal. Once again, whether this number is relevant depends heavily on the question you are trying to answer, so it might be useful to share the scientific background of the model.

Also, comparing an intercept-only model to a model with phylogeny via loo might (and might not) be sensible for your question (this is a bit of a tough topic, where reasonable people don’t necessarily agree what is best, see Hypothesis testing, model selection, model comparison - some thoughts for my current best take).

Best of luck with your model!

1 Like