- Operating System: Windows 7
- brms Version: 2.4.0 for R 3.5.1
In each rat we measure phenotype data using 6 replicates, then sacrificed the rat and measured molecular data using 3 replicates.
Question #1: Is it possible to model “phenotype data ~ molecular data” in brms without losing information on uncertainty by averaging replicates? My real model will also include extra factors (say case/control) and other molecular data (expression levels of several genes, also in triplicates).
Here is my simulated dataset:
require(tidyverse)
require(brms)
simulate.df <- function(n.grp = 20, n.mol = 3, n.pheno = 6) {
set.seed(20180913)
df.grp <- tibble(ID=LETTERS[1:n.grp], eff.ID=rnorm(n.grp, sd=2))
df.molecular <<- tibble(ID=rep(LETTERS[1:n.grp], each=n.mol),
err.mol = rnorm(n.grp*n.mol)) %>%
left_join(df.grp) %>% mutate(mol = eff.ID + err.mol)
df.phenotype <<- tibble(ID=rep(LETTERS[1:n.grp], each=n.pheno),
err.pheno = rnorm(n.grp*n.pheno)) %>%
left_join(df.grp) %>% mutate(pheno = 2*eff.ID + err.pheno)
df.all.means <<- left_join( group_by(df.molecular, ID) %>%
summarise(mol = mean(mol)),
group_by(df.phenotype, ID) %>%
summarise(pheno = mean(pheno)) )
df.se <<- left_join( group_by(df.molecular, ID),
group_by(df.phenotype, ID) %>%
summarise(pheno.se = sd(pheno, na.rm=TRUE) /
sqrt(sum(!is.na(pheno))),
pheno = mean(pheno)) )
}
simulate.df()
By design pheno = 2*mol if the errors (err.mol and err.pheno) would have been negligible. And modelling with averaged dataset nearly catches it, with u-95% CI = 1.88 =almost= 2
m1 <- brm(pheno ~ mol, data=df.all.means, seed = 20180913)
summary(m1)
## ...
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept -0.41 0.17 -0.74 -0.07 3249 1.00
## mol 1.66 0.11 1.46 1.88 3272 1.00
I tried to keep a little more information by saving mean & SE for phenotype and keeping 3 molecular replicates “as is”, but now the 95% CI = [-0.07, 0.11] is totally “off the grid” from the 2 it should be:
m2 <- brm(pheno | se(pheno.se) ~ mol + (1|ID), data=df.se, seed = 20180913)
summary(m2)
## ...
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept -1.24 0.72 -2.83 0.08 310 1.01
## mol 0.02 0.04 -0.07 0.11 1715 1.00
Question #2: What went wrong here?
Question #3: I thought Bayesian framework can “reallocate credibility” even when the size of dataset is much smaller that the number of model parameters. Just out of curiosity, is there a simple reason why Stan/brms is exploding with warnings like this:
simulate.df(n.grp=3, n.mol=2, n.pheno=3)
m3 <- brm(pheno ~ mol, data=df.all.means, seed = 20180913,
control=list(max_treedepth=13, adapt_delta=0.999))
## Warning messages:
## 1: There were 33 divergent transitions after warmup. Increasing adapt_delta above 0.999 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## 2: There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
## http://mc-stan.org/misc/warnings.html#bfmi-low
## 4: Examine the pairs() plot to diagnose sampling problems
Thank you again for your wonderful and useful brms package.