Hi all,
I have a data set with a response variable H measured for each individual (“Ind”; i.e., a plant), as well as an ultrametric phylogeny with Inds as tips. Each Ind, however, belongs to one of about 40 populations ("Pop"s; which are monophyletic), and many of our predictors of interest are measured at the Pop level (environmental variables).
We also have several predictors measured at the Ind level (plant height, etc.), and to make matters even more complicated, these are missing for a variable number of Inds per Pop (missing at random). Note that the response variable H
has no missing data.
Using brms (or other suitable software), I’d like to model H as a function of both Ind- and Pop-level variables while accounting for phylogenetic covariance and without pseudoreplicating the environmental variables.
One option I’ve tried is to take the average for each Pop but incorporate error using se() and me() for the response and predictor variables, respectively (the latter only for the predictors measured at the Ind level), using a phylogeny collapsed to the Pop level. However, this seems like it might be a waste of a good phylogeny (as there might be phylogenetic structure within and not just between Pops), but I also get the feeling that measurement “error” could be more elegantly incorporated into the model, especially given the variable sampling of Ind-level variables within each Pop.
I think this is similar to the example given in the brms phylogenetic models vignette under “A Phylogenetic Model with Repeated Measurements”, except that we have a phylogeny at the Ind level rather than the species level. A bit like having subspecies…
Here’s a snippet of the data.
example_data_mc_stan.csv (11.8 KB)
Does this formulation seem reasonable?
Ci<-vcv.phylo(Ind_tree,corr=TRUE)
Cp<-vcv.phylo(Pop_tree,corr=TRUE)
bf(H ~ mi(IndVar) + PopVar + (1 | gr(Ind,cov=Ci) ) + (1 | gr(Pop,cov=Cp) ) ) +
bf(IndVar | mi() ~ 1) +
set_rescor(F)
Thoughts, opinions, questions all welcome!
Cheers,
S