Inference for a Bayesian ANOVA and Residual Errors



I’m working on a Bayesian ANOVA, where we are analyzing grass production over time at different sites. I’m starting with a basic, two factor ANOVA with no interactions. The model is specified as

y = a + X_t B_t + X_s B_s

where X_t is the design matrix for time (there are four years, so three columns), and X_s is the design matrix for site (six sites, so five columns). I’m calculating the finite population SDs for the residuals, time, and site, and trying to infer which predictors are important. Here is the model:

    anova_model = """
    	int<lower=1> N;
    	int<lower=1> K1;
    	int<lower=1> K2;
    	int<lower=1> K3;
    	matrix[N,K1] X_T;
    	matrix[N,K2] X_S;
    	vector[N] y;
    	real alpha;
    	vector[K1] B_T;
    	vector[K2] B_S;
    	vector[K3] B_I;
    	real<lower=0> sd_y;
    	real<lower=0> sd_bt;
    	real<lower=0> sd_bs;
    transformed parameters{
    	vector[N] yhat;
    	yhat = alpha + X_S*B_S + X_T*B_T;	
    	y ~ normal(yhat, sd_y);
    	B_T ~ normal(0, sd_bt);
    	B_S ~ normal(0, sd_bs);
    	sd_y ~ cauchy(0, 2.5);
    	sd_bt ~ cauchy(0, 2.5);
    	sd_bs ~ cauchy(0, 2.5);
    generated quantities{
    	vector[N] e_y;
    	real s_y;	
    	real s_t;	
    	real s_s;	

    	e_y = y - yhat;
    	s_y = sd(e_y);	
    	s_t = sd(B_T);	
    	s_s = sd(B_S);	

And here are the estimates of the finite population SDs (note that they do not change if I remove the intercept and estimate a fully parameterized model):

SD Time 0.17 (0.05 - 0.31
SD Site 0.57 (0.43 - 0.69)
SD Y 0.82 (0.81 - 0.84)

Notice that the residual SD is much higher than the SDs for either time or site. I’m confused as to how this compares to the ANOVA table, which shows strong significant effects of both site and time:

                  df      sum_sq    mean_sq          F        PR(>F)
Site             5.0   59.431515  11.886303  17.553918  1.038023e-14
C(Treat_Year)    3.0   22.505475   7.501825  11.078838  8.090096e-07
Residual       229.0  155.063010   0.677131        NaN           NaN

Is the residual SD higher in the Bayesian model because sd(e_y) is using a different df. for residual variance? It’s using N-1 by default, but the ANOVA table would be using N - 9 (1 df. for the mean, 8 for time and site)?

I noticed that in Gelman (2005) - Ann. of Appl. Stat. and in Gelman and Hill (2007), the ANOVA figures don’t display the finite SD for ‘residual error’. I know for sure that, if we submit a figure showing no residual variance, we will be asked that question. How can I infer that effects are important if the residual error is so much higher than the error among sites or years? I guess the question is mostly about inference and how to infer effects are important and why there’s a disconnect between the Bayesian results and the ANOVA table.


This sounds like one for a real statistician! @bgoodri or @avehtari will know.

I think the key here is what you mean by “important”. Highly significant, small magnitude effects are not usually considered “important” by anyone other than statisticians (or scientists, editors being advised by them).


Thanks! I re-read Gelman (2005) and noticed two things: First, residual variance is shown in Fig. 5, just labeled as the five-way interaction. And like my issue, many of the effects denoted ‘significant’ have p < 0.0001 but SD that is less than the residual variance (so significant effects can have small finite variances). And in Dr. Gelman’s discussion of inference, he seems to explicitly avoid the notion of testing whether or not effects are ‘significant’ and instead discusses magnitude (which I find more informative, anyway).

So I came to the same conclusion as you. Effects can be significant but small, so should be OK to say something like 'Variation in years was present, but extremely small compared to large differences among sites (not to mention the data are very noisy, hence the high residual variation)". Or something along those lines. Whether or not editors or reviewers are actually OK with this is another issue…


Maybe I’m putting the pieces together incorrectly, without the data I’m less certain. Relative to this question,

I believe you should be comparing your SD Y = 0.82 to the square root of the Residaul mean_sq, \sqrt(0.677) = 0.82.

As a strategy for inferring importance, you could argue along the lines of your credible intervals excluding zero.


This is not an uncommon strategy, but I wouldn’t quite say it infers importance. It’s entirely possible (in fact not uncommon) for the interval to exclude zero and for the estimate to be too small to be important (in whatever context).


I think this gets at the heart of the matter. Gelman (2005) demonstrates little/no relationship between the p-value (significance, CI including 0) and the magnitude (SD of effects). This approach makes more sense to me. Year is highly significant by p-value, but explains very little variation (about 4%), so the significance based on p-values is overstated.


One caveat to this perspective is that when you’re doing basic science that is trying to develop a full understanding of a phenomenon/system, variables that have a small-but-credibly-non-zero effect may nonetheless signal an important feature of the system for theory to accommodate. This contrasts with the scenario I think the previous posts have considered where you’re trying to identify variables that will yield biggest bang-for-buck in future manipulations.


@mike-lawrence yeah, I just meant that excluding zero is not directly connected to importance without considering the context. Certainly true what you say, though.


For collinear predictors this doesn’t work. See, e.g.