Multivariate models with different families and missing data

I am wondering if the following ideas are something that can be correctly expressed with the current brms syntax.
Suppose that I have three ways of measuring the “size” of some experimental unit. All three measurements are different kinds of observation of some latent variable, which we might call “true size”. One variable is the number of leaves (a count) and the other two are continuous measures of size. Of these, one is always smaller than the other – say, the maximum size and the actual size.

I was wondering if its possible to model the correlations among all of these, while approximating “true size” as some latent variable that could be used elsewhere. However my attempts to do this result in models that fit quite badly, so I suspect I’m thinking about it wrong. Here’s a reproducible example:

library(dplyr)
library(brms)
set.seed(4812)
thirty_plants <- data.frame(ID = 1:30) %>% 
  mutate(true_size = rlnorm(30, meanlog = 6, sd = 1),
         leaves = rpois(30, lambda = log(true_size) * 3.6),
         max_size = exp(log(true_size) + rnorm(30, 0, 0.4)),
         act_size = max_size - rlnorm(30, 2, 0.5))
thirty_plants

fit1 <- brm(
  cbind(log(max_size), log(act_size)) ~ (1|b|ID),
  data = thirty_plants, chains = 2, cores = 2
)

ranef(fit1)

In my imagination these random intercepts (one for every plant) capture the information we have about “true size”.
I tried this also for two variables with different distributions, with still worse results:

bf_max <- bf(log(max_size) ~ (1|b|ID), family = "gaussian")
bf_lvs <- bf(leaves ~ (1|b|ID), family = "poisson")

fit2 <- brm(bf_max + bf_lvs, data = thirty_plants, chains = 2, cores = 2)

A followup questions is, assuming that this is possible, how to include missing variables? Suppose half of max_size was NA … could we still fit the model, and even get posterior predictions for its probable values?

1 Like

Can you be more specific of what you mean by “quite badly”? See vignette("brms_missings") for detailes on missing value imputation in brms.

Thank you for the reply! Sorry I didn’t know how much detail to include about the model fit. When i tried to to fit the model i’ve called fit2 above, i get the following:

1: There were 327 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 
2: There were 2 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low 
3: Examine the pairs() plot to diagnose sampling problems

My main question is if this is the right way to model an observation process in brms. That is, in this case we have an unobservable variable (size) which we then measure in two ways (max_size and leaves).

The problem is that ID has as many levels as observations in the data. When modeling a group-level effect for it, this is actually the same as a residual standard deviation, which we already have in the gaussian model in terms of sigma. Accordingly the gaussian part of the model is not identified.

If you want to learn more about missing value imputation in brms, take a look at vignette("brms_missings").