Multivariate models with different families and missing data

aammd · June 1, 2018, 5:52am

I am wondering if the following ideas are something that can be correctly expressed with the current brms syntax.
Suppose that I have three ways of measuring the “size” of some experimental unit. All three measurements are different kinds of observation of some latent variable, which we might call “true size”. One variable is the number of leaves (a count) and the other two are continuous measures of size. Of these, one is always smaller than the other – say, the maximum size and the actual size.

I was wondering if its possible to model the correlations among all of these, while approximating “true size” as some latent variable that could be used elsewhere. However my attempts to do this result in models that fit quite badly, so I suspect I’m thinking about it wrong. Here’s a reproducible example:

library(dplyr)
library(brms)
set.seed(4812)
thirty_plants <- data.frame(ID = 1:30) %>% 
  mutate(true_size = rlnorm(30, meanlog = 6, sd = 1),
         leaves = rpois(30, lambda = log(true_size) * 3.6),
         max_size = exp(log(true_size) + rnorm(30, 0, 0.4)),
         act_size = max_size - rlnorm(30, 2, 0.5))
thirty_plants

fit1 <- brm(
  cbind(log(max_size), log(act_size)) ~ (1|b|ID),
  data = thirty_plants, chains = 2, cores = 2
)

ranef(fit1)

In my imagination these random intercepts (one for every plant) capture the information we have about “true size”.
I tried this also for two variables with different distributions, with still worse results:

bf_max <- bf(log(max_size) ~ (1|b|ID), family = "gaussian")
bf_lvs <- bf(leaves ~ (1|b|ID), family = "poisson")

fit2 <- brm(bf_max + bf_lvs, data = thirty_plants, chains = 2, cores = 2)

A followup questions is, assuming that this is possible, how to include missing variables? Suppose half of max_size was NA … could we still fit the model, and even get posterior predictions for its probable values?

paul.buerkner · June 1, 2018, 7:15am

Can you be more specific of what you mean by “quite badly”? See vignette("brms_missings") for detailes on missing value imputation in brms.

aammd · June 4, 2018, 4:18am

Thank you for the reply! Sorry I didn’t know how much detail to include about the model fit. When i tried to to fit the model i’ve called fit2 above, i get the following:

1: There were 327 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 
2: There were 2 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low 
3: Examine the pairs() plot to diagnose sampling problems

My main question is if this is the right way to model an observation process in brms. That is, in this case we have an unobservable variable (size) which we then measure in two ways (max_size and leaves).

paul.buerkner · June 5, 2018, 4:15pm

The problem is that ID has as many levels as observations in the data. When modeling a group-level effect for it, this is actually the same as a residual standard deviation, which we already have in the gaussian model in terms of sigma. Accordingly the gaussian part of the model is not identified.

paul.buerkner · June 5, 2018, 4:15pm

If you want to learn more about missing value imputation in brms, take a look at vignette("brms_missings").

Topic		Replies	Views
Multivariate formula with different number of observations brms cognitive-science	28	2469	August 5, 2020
Estimating missing response data in multivariate model brms	4	761	November 24, 2019
Multivariate models and mi() brms	9	1564	May 15, 2019
Multivariate Logistic Regression with brms brms specification , brms	3	977	April 10, 2023
Bernoulli variables of multivariate brms model poorly estimated brms	4	812	May 7, 2019

Multivariate models with different families and missing data

Related topics