# Is it possible to model triplicates and sextuplicates together?

• Operating System: Windows 7
• brms Version: 2.4.0 for R 3.5.1

In each rat we measure phenotype data using 6 replicates, then sacrificed the rat and measured molecular data using 3 replicates.
Question #1: Is it possible to model “phenotype data ~ molecular data” in brms without losing information on uncertainty by averaging replicates? My real model will also include extra factors (say case/control) and other molecular data (expression levels of several genes, also in triplicates).

Here is my simulated dataset:

``````  require(tidyverse)
require(brms)

simulate.df <- function(n.grp = 20, n.mol = 3, n.pheno = 6) {
set.seed(20180913)

df.grp <- tibble(ID=LETTERS[1:n.grp], eff.ID=rnorm(n.grp, sd=2))

df.molecular <<- tibble(ID=rep(LETTERS[1:n.grp], each=n.mol),
err.mol = rnorm(n.grp*n.mol)) %>%
left_join(df.grp) %>% mutate(mol = eff.ID + err.mol)

df.phenotype <<- tibble(ID=rep(LETTERS[1:n.grp], each=n.pheno),
err.pheno = rnorm(n.grp*n.pheno)) %>%
left_join(df.grp) %>% mutate(pheno = 2*eff.ID + err.pheno)

df.all.means <<- left_join( group_by(df.molecular, ID) %>%
summarise(mol = mean(mol)),
group_by(df.phenotype, ID) %>%
summarise(pheno = mean(pheno)) )

df.se <<- left_join( group_by(df.molecular, ID),
group_by(df.phenotype, ID) %>%
summarise(pheno.se = sd(pheno, na.rm=TRUE) /
sqrt(sum(!is.na(pheno))),
pheno = mean(pheno)) )
}

simulate.df()
``````

By design pheno = 2*mol if the errors (err.mol and err.pheno) would have been negligible. And modelling with averaged dataset nearly catches it, with u-95% CI = 1.88 =almost= 2

``````m1 <- brm(pheno ~ mol, data=df.all.means, seed = 20180913)
summary(m1)
## ...
## Population-Level Effects:
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept    -0.41      0.17    -0.74    -0.07       3249 1.00
## mol           1.66      0.11     1.46     1.88       3272 1.00
``````

I tried to keep a little more information by saving mean & SE for phenotype and keeping 3 molecular replicates “as is”, but now the 95% CI = [-0.07, 0.11] is totally “off the grid” from the 2 it should be:

``````m2 <- brm(pheno | se(pheno.se) ~ mol + (1|ID), data=df.se, seed = 20180913)
summary(m2)
## ...
## Population-Level Effects:
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept    -1.24      0.72    -2.83     0.08        310 1.01
## mol           0.02      0.04    -0.07     0.11       1715 1.00
``````

Question #2: What went wrong here?

Question #3: I thought Bayesian framework can “reallocate credibility” even when the size of dataset is much smaller that the number of model parameters. Just out of curiosity, is there a simple reason why Stan/brms is exploding with warnings like this:

``````simulate.df(n.grp=3, n.mol=2, n.pheno=3)
m3 <- brm(pheno ~ mol, data=df.all.means, seed = 20180913,