Standard operating procedures for ANOVA

I use R 3.4.4 on Ubuntu 16.04.4 LTS and ‘brms’ package version 2.2.0

brms is wonderful! It automatically took care of that difficult prior business and, after some friendly pushes, converged for my model that broke lme4::bootMer(). So for about one month I am trying to convert from lme4 to Bayesian world and to make sense of brmsfit I got. Oh boy, these contrasts were so easy to investigate with emmeans package and, unfortunately, require so much HARD and MANUAL work even in brms.

After some exploration of “8 steps to become Bayesian” I bought Kruschke 2015 book and got through first 20 chapters of it. Yes, its Stan examples are quite obsolete now, but at least the language of brms vignettes and posts of Google group started to decode for me. I have a general question: Does brms documentation have a worked out non-toy example of hierarchical Bayesian ANOVA with complicated interaction (say between 2 categorical and 1 nominal predictors), heterogeneous variances, random effects and with some automation for constructing contrasts, sampling them to plot their median and HDI, and converting these contrasts to strings for hypothesis()?

Right now I dig through http://www.flutterbys.com.au/stats/tut/tut9.2b.html, but the most interesting parts (i.e. contrasts) do not quite match. Here is the simple reproducible example with my question:

require("brms")
require("tidyverse")
require("tidybayes")

set.seed(1)
fit <- brm(count ~ log_Age_c + log_Base4_c * Trt + (1 | patient),
           data = epilepsy, family = poisson(), 
           cores=4, chains=4, sample_prior=TRUE ) 

me <- marginal_effects(fit, effects = "Trt:log_Base4_c", robust=TRUE, re_formula=NA,
                       conditions = make_conditions(fit, "log_Age_c"), 
                       int_conditions = make_conditions(fit, "log_Base4_c"))
head(me$`Trt:log_Base4_c`, 2)
#   Trt log_Base4_c log_Age_c            cond__    count patient estimate__ ...
# 1   0       -0.75 -0.222457 log_Age_c = -0.22 8.254237      NA   2.784081 ...
# 2   0           0 -0.222457 log_Age_c = -0.22 8.254237      NA   5.371503 ...

newdata = me$`Trt:log_Base4_c` %>% 
  mutate( log_Base4_c = as.numeric(as.character(log_Base4_c)) ) 
me.fs <- fitted(fit, newdata=newdata, summary=TRUE, robust=TRUE, re_formula=NA )
head(me.fs, 2)
#      Estimate ...
# [1,] 2.775040 ... # discrepancy in 3rd digit: 2.7???
# [2,] 5.371503 ... # exactly as in marginal_effects()

Just like Paul explained in Goggle group, marginal_effects() exactly match fitted() when the effects of log_Base4_c and log_Age_c do not exist. But why all other groups show the discrepancy?

h.Trt01 <- hypothesis(fit, hypothesis=c('Trt1vs0'=c("Trt1 = 0")))
h.Trt01$hypothesis
#   Hypothesis   Estimate Est.Error   CI.Lower    CI.Upper Evid.Ratio Star
# 1    Trt1vs0 -0.3364897 0.1647266 -0.6633257 -0.01424099         NA    *

Does this this mean one could safely report that the Treatment on average reduced seizures between visits by by -.34? But why when we look at this contrast directly we get

newdata[c(8,11),]
# Trt log_Base4_c    log_Age_c    ...
# 8    0           0 1.731103e-16 ...
# 11   1           0 1.731103e-16 ...
s.Trt01 <- fitted(fit, newdata=newdata[c(8,11),], summary=FALSE, robust=TRUE, re_formula=NA )
s.Trt01 <- s.Trt01 %>% as_tibble %>% rename(Trt0=V1, Trt1=V2) %>% mutate(Trt1vs0=Trt1-Trt0)
s.Trt01$Trt1vs0 %>% mean_qi
#          y      ymin        ymax .prob
# 1 -1.71745 -3.439241 -0.07358314  0.95 

This “direct look” gives the “average Treatment effect” which is much more on the scale of difference of medians between the groups being compared. Am I correct that by sampling we accounted for all correlations, so we could not blame them for this discrepancy?

I thought that I can sample my contrast in terms of group means using fitted() to validate tedious manual “contrast forging from raw coefficients”, currently required for hypothesis()… But if hypothesis() disagree with sampled contrast on purpose, could I request that some helper (say make_contrasts, in the spirit of make_conditions) be added to simplify/automate hypothesis testing for complicated models? It will be so helpful if one could also formulate hypotheses in terms of group means. Or, perhaps, it will be possible to implement sampling and hypotheses for conditional random variables like “Response if we know that Predictor == LevelA or Predictor == LevelB” ?

I am not sure what the emmeans package does and I am not sure how to do what you want to do with brms, although it is probably possible. But one of the emmeans vignettes claims to support rstanarm
https://cran.r-project.org/web/packages/emmeans/vignettes/models.html
so you could try to use that or else dig through how the emmeans package does what it does and then try to apply that to the brms output.

If my model would have homogeneous variance I would have used rstanarm + emmeans, even though I love the idea of moving away from p-values. But brms + LOO (as well as inspection of observed data) insist that heterogeneous model fits my data better, so I strive to analyze it the best I can with brms.
What I miss is usability of emmeans, when you say emmeans(fit, pairwise ~ Trt) and it will automatically parse given fitted model, understanding that mean values for log_Base4_c and log_Age_c should be used in the contrasts to be constructed. In case of several categorical predictors the ability to specify contrasts as ~ Trt | Confounder is so very convenient. brms::marginal_effects with newdata already offer a way to do it for group means, it is strange that NO similar functionality is implemented for contrasts (forcing us to a quick fix of constructing “our own copy” of model matrix, praying brms will never update its default way).
rjags + coda scripts for Kruschke (2015) offer function smryMCMC(codaSamples, …, x1contrasts, x2contrasts, x1x2contrasts,…) that can contrast group1 vs group2 and even “average of some groups” vs “average of some other groups”, so in his opinion specifying contrasts in terms of groups is perfectly Bayesian.

If Kruschke has code that does what you want, then you can probably call his code on some columns of as.matrix(fit). If it really has to be in coda format, there is an As.mcmc.list function in the rstan package that will convert a stanfit object to a mcmc.list, but you probably have to call it on the $stanfit element of the brmsfit object.

You can also call as.mcmc to a brmsfit object directly.

Sure, for simple models, where contrasts are simple to get even it terms of the regression coefficients,

require(broom)
tidyMCMC(as.matrix(fit, pars="^b_"), estimate.method = "median", 
         conf.int = TRUE, conf.method = "HPDinterval")

looks like a reasonable solution. The problem is that we are forced to “think low level” in unwieldy language of regression coefficients, link functions, etc. instead of “middle level language” of group means or, ideally, “high level language” of formulas like “~ (Treatment + Confounder) | Condition”.

Sure, when the model is of average complexity, say with homogeneous group variances, rstanarm + emmeans conveniently allow us to speak the high level language:

require(rstanarm)
require(emmeans)
fit.rst <- rstanarm::stan_glmer(
  count ~ log_Age_c + log_Base4_c * Trt + (1 | patient),
  data = epilepsy, family = poisson(), cores=4, chains=4 )
rg.rst <- emmeans::ref_grid(fit.rst)
emm <- emmeans(rg.rst, ~ Trt) # log_Age_c and log_Base4_c are handled silently

( emm <- emmeans(rg.rst, ~ Trt | log_Age_c + log_Base4_c) )
# This is a frequentist summary. See `?as.mcmc.emmGrid' for more on what you can do.
# log_Age_c = 1.731103e-16, log_Base4_c = 8.09288e-17:
#  Trt   emmean        SE  df asymp.LCL asymp.UCL
#  0   1.787696 0.1166566 Inf  1.559054  2.016339
#  1   1.458956 0.1112507 Inf  1.240908  1.677003
# 
# Results are given on the log (not the response) scale. 
# Confidence level used: 0.95

summary(pairs(emm), infer=c(TRUE,TRUE), type = "response")
# This is a frequentist summary. See `?as.mcmc.emmGrid' for more on what you can do.
# log_Age_c = 1.73110334329919e-16, log_Base4_c = 8.09288010605398e-17:
#  contrast    ratio       SE  df asymp.LCL asymp.UCL z.ratio p.value
#  0 / 1    1.389217 0.217707 Inf  1.021825  1.888703   2.098  0.0359
# 
# Confidence level used: 0.95 
# Intervals are back-transformed from the log scale 
# Tests are performed on the log scale

Ouch, I guess the discrepancy between hypothesis() and “sampling contrasts directly” in my original question was caused by forgetting that hypothesis() do not work on the scale of response … one more hurdle to jump manually in order to produce report :( .

emmeans can even give back the formulas for the group means or contrasts, to allow for easy Bayesian median+HDI or hypothesis() - like investigation:

pairs(emm)@linfct
# (Intercept) log_Age_c log_Base4_c Trt1 log_Base4_c:Trt1
# [1,]           0         0           0   -1     -8.09288e-17
apply(pairs(emm)@linfct, 1, function(x) 
  paste0(paste(paste(x,names(x), sep="*"), collapse=" + "), " = 0") )
[1] "0*(Intercept) + 0*log_Age_c + 0*log_Base4_c + -1*Trt1 + -8.09288010605398e-17*log_Base4_c:Trt1 = 0"

apply(emm@linfct, 1, function(x) 
  paste0(paste(paste(x,names(x), sep="*"), collapse=" + "), " = 0") )
# [1] "1*(Intercept) + 1.73110334329919e-16*log_Age_c + 8.09288010605398e-17*log_Base4_c + 0*Trt1 + 0*log_Base4_c:Trt1 = 0"                   
# [2] "1*(Intercept) + 1.73110334329919e-16*log_Age_c + 8.09288010605398e-17*log_Base4_c + 1*Trt1 + 8.09288010605398e-17*log_Base4_c:Trt1 = 0"

Kruschke has hard-coded his convenience function smryMCMC() individually for each example of his book, so it is hard to port them for new models. I gave it only as an example, that contrasts (in addition to brms::hypothesis() ) are also “approved Bayesian” way.

Unfortunately, it is exactly the more complicated models (handled by brms, but not by rstanarm) where only “low level language” for contrasts/hypotheses and “middle level language” for group means are provided. May I ask for new brms feature which will embrace and extend the emmean-style formulas for group means and contrasts? Ease of use really matters…

You may want to ask the emmeans maintainers if they want to feature brms as well. This currently sounds like the most reasonable solution to me. Or is there any reason you think such functionality would be better put in brms directly?

I guess since you have extended brmsformula beyond compatibility with standard R formula you “own and develop” this standard in brms. Perhaps “exporting” in as.mcmc() - style pieces of brmsfit, which can be described by a single standard R formula, will be the most painless way to interface brms and emmeans. They even specified a minimal set of methods one has to code to use all their functionality in https://cran.r-project.org/web/packages/emmeans/vignettes/extending.pdf

I hope it will be a win-win trade, like the one where you saved loads of work by using stats::model_matrix()

I see thanks. I am hesitant to write all the emmeans code myself at the moment.

Is it really that hard to do what you want with the current brms methods? If I understand it correctly, you have to set up a new data frame containing the conditions you want to get fitted values for an pass it to fitted(., newdata = <your conditions>, re_formula = NA). Is my understanding correct?

I know that ease of use matters, but I am hesitant to implement all the necessary stuff if all that needs to be done from the user side is to set up a small data.frame.

Like @paul.buerkner said, I think what makes the most sense is for brms and rstanarm to fit the model and offer functionality for conveniently extracting estimates, but we are going to need to rely on the help of the rest of the community in order for their to be super convenient ways of doing every possible task after fitting a model. So I think there is definitely room for packages that take the output from rstanarm and brms and provide helpful functionality for particular tasks. We’re starting to see more of that but it will still take some time.

Dear @paul.buerkner and @jonah, I really really appreciate your efforts to fit the greatest possible variety of models with brms. The ease of specifying complex models and the ability to automatically pick sensible priors is exactly the competitive edge that sets brms apart for me. I understand that the sheer variety of supported models makes it harder to design a uniform and easy method to generate reports from those marvelously fitted models.

So I swallowed my urge to cry “Yes! It is hard to generate a report with contrasts from brms! This is why less experienced ones (like me) cry for help and more experienced ones start their own packages, like tidybayes.”

Hopefully even this semi-automatic code will allow other users to handle interactions of unlimited complexity in a very narrow case of non-hierarchical ANOVA without links. Maybe you could advise me how to fool-proof this quick-fix? Or maybe it could convince you that there is no need to “parallel code” all emmeans functionality in brms, when just extending emm_basics() will do the trick.

require(emmeans)
fake <- lm(count ~ log_Age_c + log_Base4_c * Trt, data = model.frame(fit))
emm <- emmeans(fake, pairwise ~ Trt | log_Base4_c + log_Age_c,
               at = list(log_Age_c = c(-0.3, 0, 0.3),
                         log_Base4_c = c(-2, 1, 0)) )

… that was the dangerous & manual part

I copy&pasted the fixed effects of formula(fit) into lm(), just to get all attr(*,…) needed for emm_basics() to work. Pro: all the convenient language of emmeans become available. Contra: if the order of regression coefficients do not match it all falls apart. A manual “sanity check” is needed:

tibble( brms.names = colnames( as.matrix(fit, pars="^b_") ),
        emm.names  = colnames( (emm$emmeans)@linfct ),
        emm.c.names  = colnames( (emm$contrasts)@linfct ) )
# # A tibble: 5 x 3
#   brms.names         emm.names        emm.c.names     
#   <chr>              <chr>            <chr>           
# 1 b_Intercept        (Intercept)      (Intercept)     
# 2 b_log_Age_c        log_Age_c        log_Age_c       
# 3 b_log_Base4_c      log_Base4_c      log_Base4_c     
# 4 b_Trt1             Trt1             Trt1            
# 5 b_log_Base4_c:Trt1 log_Base4_c:Trt1 log_Base4_c:Trt1

Then we paste() convenient names for group means and contrasts

emm.gm.names <- emm$emmeans %>% as_tibble %>% mutate(
  tidyname = paste0("gm ", Trt,",", log_Base4_c,",", log_Age_c ))
emm.c.names <- emm$contrasts %>% as_tibble %>% mutate(
  tidyname = paste0(contrast, " | ", log_Base4_c,",", log_Age_c) )

Finally we use the regex for brmsfit coefficients from the above “sanity check” and enjoy all group means and contrasts in a single gather_samples() -like structure of tibybayes, free to rearrange / filter / summarize / plot them as we want.

spr.b <- fit %>% spread_samples(`^b_.*`, regex=TRUE) 

spr.gm <- as.matrix(spr.b %>% select(-starts_with("."))) %*% t((emm$emmeans)@linfct)
colnames(spr.gm) <- emm.gm.names$tidyname

spr.c <- as.matrix(spr.b %>% select(-starts_with("."))) %*% t((emm$contrasts)@linfct)
colnames(spr.c) <- emm.c.names$tidyname

gthr <- cbind(spr.b, spr.gm, spr.c) %>% as_tibble %>% 
  gather(key="term", value="estimate", -starts_with(".")) %>% group_by(term)

gthr %>% filter(!grepl("^b_",term)) %>% mean_qi
# # A tibble: 27 x 5
# # Groups:   term [27]
#   term            estimate conf.low conf.high .prob
#   <chr>              <dbl>    <dbl>     <dbl> <dbl>
# 1 0 - 1 | 0,0      0.336     0.0232     0.642 0.950
# 2 0 - 1 | 0,-0.3   0.336     0.0232     0.642 0.950
# 3 0 - 1 | 0,0.3    0.336     0.0232     0.642 0.950
# 4 0 - 1 | 1,0     -0.00899  -0.531      0.526 0.950
# 5 0 - 1 | 1,-0.3  -0.00899  -0.531      0.526 0.950
# 6 0 - 1 | 1,0.3   -0.00899  -0.531      0.526 0.950
# 7 0 - 1 | -2,0     1.03      0.0895     1.97  0.950
# 8 0 - 1 | -2,-0.3  1.03      0.0895     1.97  0.950
# 9 0 - 1 | -2,0.3   1.03      0.0895     1.97  0.950
# 10 gm 0,0,0         1.79      1.56       2.01  0.950
# # ... with 17 more rows

We could even auto-generate these “as.character” expressions for hypothesis(). Note: printing emm.hyp shows how tedious it becomes to specify hypotheses in the “language of raw coefficients” for all cases, except the simplest toy models…

emm.hyp <- apply((emm$contrasts)@linfct, 1, function(r) {
  r <- r[r != 0] # be sparse for viewing convenience
  hyp <- paste(r, names(r), sep="*") %>% paste(collapse=" + ") %>% paste0(" = 0")
  gsub("(Intercept)", "Intercept", hyp, fixed=TRUE)
} )
names(emm.hyp) <- paste0(emm.c.names$tidyname, " = 0 ")

hyp <- hypothesis(fit, emm.hyp)
hyp$hypothesis
#             Hypothesis    Estimate Est.Error    CI.Lower  CI.Upper Evid.Ratio Star
# 1 0 - 1 | -2,-0.3 = 0   1.02669257 0.4786458  0.08947609 1.9737549         NA    *
# 2  0 - 1 | 1,-0.3 = 0  -0.00899184 0.2680102 -0.53107649 0.5256432         NA     
# 3  0 - 1 | 0,-0.3 = 0   0.33623630 0.1586176  0.02324857 0.6420955         NA    *
# 4    0 - 1 | -2,0 = 0   1.02669257 0.4786458  0.08947609 1.9737549         NA    *
# 5     0 - 1 | 1,0 = 0  -0.00899184 0.2680102 -0.53107649 0.5256432         NA     
# 6     0 - 1 | 0,0 = 0   0.33623630 0.1586176  0.02324857 0.6420955         NA    *
# 7  0 - 1 | -2,0.3 = 0   1.02669257 0.4786458  0.08947609 1.9737549         NA    *
# 8   0 - 1 | 1,0.3 = 0  -0.00899184 0.2680102 -0.53107649 0.5256432         NA     
# 9   0 - 1 | 0,0.3 = 0   0.33623630 0.1586176  0.02324857 0.6420955         NA    *

My boss is already “going slightly mad” at the long time it takes me to generate a report for tons of our physiologically important contrasts. I guess I will have to sacrifice the LOO improvement shown by placing our many groups in “hierarchical” model :(. But if anyone think that such “analysis automation” is the right way to go, I will later return to extend “SOP ANOVA” to models with links and, if I will figure out how to do it, to hierarchical ANOVAs.

Please keep doing you wonderful work of developing brms!

Thanks for your code and the work you put on this! We really appreciate this! I will be looking at your code in more detail later on today.

I have been reading the documentation of extending emmeans. The problem I see is that we can only support a very minor class of brms models with emmeans, which implies a lot of special case coding and checking.

Also, most of emmeans summaries are frequentist and I am hesitant to support this. There is some mcmc functionality implemented, but more as a side track than as a main feature I feel.

What would be the most convincing evidence (for me personally) to irreversibly discard p-values? Well, if for the models I encounter in my field I could simultaneously look at p-values and Bayesian factors / HDI … and I consistently see (not just read in the book, but see in my simulations) that p-values are less informative or misleading… then in a few months I will fully “reallocate my credibility” to Bayesian approach.

Here is a blog entry of A. Gelman, where he is absolutely not shy to compare frequentist and Bayesian performance.

As for emmeans being able (so far) to support only a minor class of brms models: maybe you can provide a way to “slice of” a manageable piece of brms (say with resp="…" argument to select a single DV from multilevel model) … until emmeans’ developers will generalize their package :)

Let me think of the “slicing” idea a bit more :-)

I gave it some though and decided not to contribute a brmsfit method to emmeans myself. The reason is, as explined above, that emmeans primarily does frequentist analysis (even for Bayesian models) and I don’t want to actively support that kind of analysis. I am of course fine with someone else writing a method for it (most of it can be copied from the corresponding stanreg methods).

The code you provided above may be very helpful to other users and maybe we can get something similar implemented in brms or in one of our other packages. I do absolutely agree with you that ease of use matters in particular for complex factorial designs.

For what it is worth, there is now primitive brmsfit support in the latest github push of emmeans (rvlenth/emmeans). In addition, the default summary method shows the posterior medians and HPD intervals when a Bayesian model is involved.

To illustrate, using the fit1 example in the “overview” vignette:

> emmeans(fit1, ~ disease | sex)
sex = male:
 disease   emmean lower.HPD upper.HPD
 other   3.368600  2.461241  4.277283
 GN      2.970052  2.143481  3.817951
 AN      2.864909  1.925061  3.730315
 PKD     3.978328  2.792842  5.115858

sex = female:
 disease   emmean lower.HPD upper.HPD
 other   4.828601  4.146872  5.569267
 GN      4.453527  3.702148  5.200195
 AN      4.334489  3.680452  4.967279
 PKD     5.447884  4.285862  6.683391

HPD interval probability: 0.95

However, in this particular example, it looks like the predictions are actually on the log scale even though the link function is displayed as “identity”. Here’s a fix-up:

> rg = ref_grid(fit1)
> rg = update(rg, tran = "log")
> summary(emmeans(rg, ~ disease | sex), type = "response")
sex = male:
 disease    emmean lower.HPD upper.HPD
 other    29.03784  8.472659  63.17685
 GN       19.49293  6.108617  39.85288
 AN       17.54745  5.617701  38.55814
 PKD      53.42763  9.002946 142.72457

sex = female:
 disease    emmean lower.HPD upper.HPD
 other   125.03593 50.260528 233.99721
 GN       85.92948 34.432546 165.56097
 AN       76.28595 37.209810 139.01810
 PKD     232.26621 40.224000 653.56205

Results are back-transformed from the log scale 
HPD interval probability: 0.95

That’s amazing, thanks! Is the code publically available somewhere? Maybe I can help making it more general.

Yes. It’s on the github site rvlenth/emmeans. The code of interest is in the file brms-support.R.

For comparison, here’s the result for the running example in this discussion:

> emmeans(fit, pairwise ~ Trt)
NOTE: Results may be misleading due to involvement in interactions
$`emmeans`
 Trt   emmean lower.HPD upper.HPD
 0   1.795622  1.569535  2.022216
 1   1.450573  1.232437  1.678726

Results are given on the log (not the response) scale. 
HPD interval probability: 0.95 

$contrasts
 contrast estimate  lower.HPD upper.HPD
 0 - 1    0.349521 0.03612754 0.6760148

Results are given on the log (not the response) scale. 
HPD interval probability: 0.95

> emmeans(fit, pairwise ~ Trt, type = "response")
NOTE: Results may be misleading due to involvement in interactions
$`emmeans`
 Trt   emmean lower.HPD upper.HPD
 0   6.023218  4.677796  7.376974
 1   4.265560  3.381893  5.303215

Results are back-transformed from the log scale 
HPD interval probability: 0.95 

$contrasts
 contrast estimate lower.HPD upper.HPD
 0 - 1    1.418388 0.9989445  1.914442

Results are back-transformed from the log scale 
HPD interval probability: 0.95 

> emmip(fit, Trt ~ log_Base4_c, CIs = TRUE, cov.reduce = range)

image

The intervals shown are 95% HPD intervals

Hi Russ,

On small comment: Could you by default show less than 7 digits? It’s likely that by default number of draws you have maximum of 2 digits accuracy, and the rest of the digits are distraction.