I’m analyzing data from an experiment where researchers planted seeds at three separate sites. On each site, they selected plots covered by four habitats. In each plot, they recorded three environmental variables: soil pH (pH), soil moisture (SM) and soil temperature (ST). I each habitat they planted 6 seeds
The study aims to analyse the effect of habitat, pH, soil moisture and soil temperature on the probability of germination.
I drew the following DAG that assumes that “habitat” has both direct and indirect effects on germination. I added “Site” as varying intercept to account for the heterogeneity in germination success that is not explained by pH, SM, ST or habitat.
library(dagitty)
g <- dagitty("dag{
Habitat -> pH -> G
Habitat -> ST -> G
Habitat -> SM -> G
Habitat -> G
Site -> G
}")
impliedConditionalIndependencies(g)
Hbtt _||_ Site
SM _||_ ST | Hbtt
SM _||_ Site
SM _||_ pH | Hbtt
ST _||_ Site
ST _||_ pH | Hbtt
Site _||_ pH
plot(g)
I wrote the following model:
Number of seedlings that germinated ~ beta_binomial(6, mu * K, (1-mu) * K);
logit(mu) ~ α+α_site[site]+α_habitat[habitat]+
β1[habitat]*pH + β2[habitat]*Soil moisture + β4[habitat]*Soil temperature
α,α_site ~ Normal(0,0.1)
α_habitat ~ Normal(0,σ_habitat )
σ_habitat~ Exponential(3)
β1 ~ Normal(μ_pH,σ_pH)
β2 ~ Normal(μ_SM,σ_SM)
β2 ~ Normal(μ_ST,σ_ST)
μ_pH, μ_SM,μ_SM ~ Normal(0,0.1)
σ_pH,σ_SM, σ_ST ~ Exponential(3)
Questions:
-
Is this model consistent with the DAG I drew?
-
Can I interpret “a_habitat” estimates as the effect of “habitat” after already considering the effects of PH, SM and ST?
-
One of my colleagues asked if we could somehow estimate the pure effect of the “habitat” variable germination. I don’t think we can decouple the effects of the environmental variables from “habitat” with this experimental design, but I would like to hear your opinion.