Hello everyone,
This is my first post here. I am a graduate level mycologist who specializes in fungal mating/nuclear behaviour and a beginner working with brms and Bayesian modeling. These topics are far from my area of expertise and there are limited resources at my institution to meaningfully assist with my work. I’ve been getting my teeth into these concepts for about 3 months.
I wanted to make a post here to begin a discussion about my work. Henceforth, I’ll describe my data, model, and reasoning. I will try to be as brief as possible, but that wil be difficult. All insights, critiques, suggesting are very welcome.
To begin, I’ll paste my base model here:
fit ← brm(
-
formula = Nuclear_type ~ Maturity + (1 | Replicate),
-
family = categorical(),
-
data = Fully_Expanded_Spore_Data,
-
chains = 4,
-
iter = 4000,
-
cores = 4
- )
This data consists of 11090 spores that I examined for how many nuclei they contain (single nucleus, two nuclei, three nuclei, or four nuclei). This is termed “Nuclear Type”. I found that over the course of an individuals life it produces different proportions or ratios of these. This is a new phenomenon I’m describing, thus, meaningful priors are hard to find and I’ve stuck with default priors. I separated the maturity of individuals into three categories termed “Young”, “Middle” and “Old”. In total, I examined 15 individuals (Replicate) and only 7 of them survived through all maturity categories, while the remainder only existed in either 1 or 2 of them. Because of this, the individuals (replicates) are treated as a random effect as they are uneven.
I decided to go with this model because of how brms handles partial pooling and is able to draw information from the data as a whole rather than isolating individuals that are confounded within maturity categories. It seems to handle unbalanced data with grace. Is my reasoning sound?
I’ve been able to conceptualize the log odds coefficients and have been able extrapolate them algebraically to examine all pairwise comparisons beyond the base model outputs. Is it worth showing this off? or should I just keep this in my Appendix?
I was able to convert log-odds to predicted probabilities using these functions:
new_data ← data.frame(Maturity = c(“Young”, “Middle_age”, “Old”))
preds ← fitted(fit, newdata = new_data, scale = “response”, re_formula = NA)
Then I summarized the output in a tibble
with point estimates. I’ve seen people use posterior_epred()
more commonly, but fitted()
seemed appropriate here for marginal estimates.
Let me know what you guys think! Again thank you for taking your time to read over this; I’m in the weeds.
Cheers!
Ben Bohemier
MScF Candidate
Lakehead University