I am wondering if the following ideas are something that can be correctly expressed with the current brms syntax.
Suppose that I have three ways of measuring the “size” of some experimental unit. All three measurements are different kinds of observation of some latent variable, which we might call “true size”. One variable is the number of leaves (a count) and the other two are continuous measures of size. Of these, one is always smaller than the other – say, the maximum size and the actual size.
I was wondering if its possible to model the correlations among all of these, while approximating “true size” as some latent variable that could be used elsewhere. However my attempts to do this result in models that fit quite badly, so I suspect I’m thinking about it wrong. Here’s a reproducible example:
library(dplyr)
library(brms)
set.seed(4812)
thirty_plants <- data.frame(ID = 1:30) %>%
mutate(true_size = rlnorm(30, meanlog = 6, sd = 1),
leaves = rpois(30, lambda = log(true_size) * 3.6),
max_size = exp(log(true_size) + rnorm(30, 0, 0.4)),
act_size = max_size - rlnorm(30, 2, 0.5))
thirty_plants
fit1 <- brm(
cbind(log(max_size), log(act_size)) ~ (1|b|ID),
data = thirty_plants, chains = 2, cores = 2
)
ranef(fit1)
In my imagination these random intercepts (one for every plant) capture the information we have about “true size”.
I tried this also for two variables with different distributions, with still worse results:
bf_max <- bf(log(max_size) ~ (1|b|ID), family = "gaussian")
bf_lvs <- bf(leaves ~ (1|b|ID), family = "poisson")
fit2 <- brm(bf_max + bf_lvs, data = thirty_plants, chains = 2, cores = 2)
A followup questions is, assuming that this is possible, how to include missing variables? Suppose half of max_size
was NA … could we still fit the model, and even get posterior predictions for its probable values?