Implementing phylogenetic logistic regression for binary response variable in Stan


I’m a rather new Stan user and I’m working on building a Bernoulli logistic regression model that corrects for phylogenetic correlation. For reference, I am basing this on the phylogenetic logistic regression models of Garland and Ives (2010) and the R function phyloglm() in the package phylolm.

My basic thought is to modify the Bernoulli example model code from github (reposted below) to included additional independent variables and a variance/covariance matrix to account fo phylogeny. My sense is that this would have to be a hierarchical model in which theta itself is modeled with the independent variables and the phylogenetic correlation matrix, but I am not sure how to begin implementing this.

data { 
  int<lower=0> N; 
  int<lower=0,upper=1> y[N];
parameters {
  real<lower=0,upper=1> theta;
model {
  theta ~ beta(1,1);
  for (n in 1:N) 
    y[n] ~ bernoulli(theta);

Further, Garland and Ives (2014) developed a Bayesian version of their phylogenetic GLM for MCMCglm, with the following formula:

The random variable u represents residual variation, and s is a random effect that captures hypothesized covariances in the data. The link to the book chapter that presents this equation in more detail can be found here.

I am wondering:

  1. Has anyone done this kind of modeling before in Stan and would be willing to share some example code?
  1. Does anyone have suggestions about how to implement the above mentioned formula from Garland and Ives in Stan?




There is a phylogenetic regression example here:

and I swear I’ve seen a few others. Let me dig around in my stan models folder.

1 Like


Thanks for sending this along-- definitely a helpful example. What I am a bit stuck on how to modify such a model for a dataset with a binary response variable. If you, or anyone out there, has any ideas or examples in that realm I would greatly appreciate it.



Your y (response) would be something like this:
int<lower=0,upper=1> y[N]

in the data section (totally the easy answer here ;) ).

I’d need to think about the model section. We need to roll in, maybe the bernoulli_logit ? I bet there is some elegant way to do this.



So this discusses

the binomial models in phylo regressions including an example using MCMCglmm. It looks likes they recommend probit instead of logit.

1 Like


Oppps you already have that same source! I’ll have some time tomorrow so I can try and fiddle with the binomial model