Is it possible to model triplicates and sextuplicates together?


#1
  • Operating System: Windows 7
  • brms Version: 2.4.0 for R 3.5.1

In each rat we measure phenotype data using 6 replicates, then sacrificed the rat and measured molecular data using 3 replicates.
Question #1: Is it possible to model “phenotype data ~ molecular data” in brms without losing information on uncertainty by averaging replicates? My real model will also include extra factors (say case/control) and other molecular data (expression levels of several genes, also in triplicates).

Here is my simulated dataset:

  require(tidyverse)
  require(brms)

  simulate.df <- function(n.grp = 20, n.mol = 3, n.pheno = 6) {
  set.seed(20180913)
  
  df.grp <- tibble(ID=LETTERS[1:n.grp], eff.ID=rnorm(n.grp, sd=2))
  
  df.molecular <<- tibble(ID=rep(LETTERS[1:n.grp], each=n.mol), 
                          err.mol = rnorm(n.grp*n.mol)) %>%
    left_join(df.grp) %>% mutate(mol = eff.ID + err.mol)
  
  df.phenotype <<- tibble(ID=rep(LETTERS[1:n.grp], each=n.pheno), 
                          err.pheno = rnorm(n.grp*n.pheno)) %>%
    left_join(df.grp) %>% mutate(pheno = 2*eff.ID + err.pheno)
  
  df.all.means <<- left_join( group_by(df.molecular, ID) %>% 
                                summarise(mol = mean(mol)),
                              group_by(df.phenotype, ID) %>% 
                                summarise(pheno = mean(pheno)) )
  
  df.se <<- left_join( group_by(df.molecular, ID),
                       group_by(df.phenotype, ID) %>% 
                         summarise(pheno.se = sd(pheno, na.rm=TRUE) /
                                     sqrt(sum(!is.na(pheno))), 
                                   pheno = mean(pheno)) )
  }

  simulate.df()

By design pheno = 2*mol if the errors (err.mol and err.pheno) would have been negligible. And modelling with averaged dataset nearly catches it, with u-95% CI = 1.88 =almost= 2

m1 <- brm(pheno ~ mol, data=df.all.means, seed = 20180913)
summary(m1)
## ...
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept    -0.41      0.17    -0.74    -0.07       3249 1.00
## mol           1.66      0.11     1.46     1.88       3272 1.00

I tried to keep a little more information by saving mean & SE for phenotype and keeping 3 molecular replicates “as is”, but now the 95% CI = [-0.07, 0.11] is totally “off the grid” from the 2 it should be:

m2 <- brm(pheno | se(pheno.se) ~ mol + (1|ID), data=df.se, seed = 20180913)
summary(m2)
## ...
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept    -1.24      0.72    -2.83     0.08        310 1.01
## mol           0.02      0.04    -0.07     0.11       1715 1.00

Question #2: What went wrong here?

Question #3: I thought Bayesian framework can “reallocate credibility” even when the size of dataset is much smaller that the number of model parameters. Just out of curiosity, is there a simple reason why Stan/brms is exploding with warnings like this:

simulate.df(n.grp=3, n.mol=2, n.pheno=3)
m3 <- brm(pheno ~ mol, data=df.all.means, seed = 20180913,
          control=list(max_treedepth=13, adapt_delta=0.999))
## Warning messages:
## 1: There were 33 divergent transitions after warmup. Increasing adapt_delta above 0.999 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 
## 2: There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
## http://mc-stan.org/misc/warnings.html#bfmi-low 
## 4: Examine the pairs() plot to diagnose sampling problems

Thank you again for your wonderful and useful brms package.