Overall test for effect of factor similar to anova() in r

Hi all,

This is perhaps a stupid question, so forgive me as my experience with Bayesian models is less than frequentist models.

I have a dataset that has roughly 10 species and X number of populations for each species. Each individual population has a fire frequency (proportional) estimate. I am interested in knowing if species designation is significant. I am currently not concerned about individual species estimates. The dataset looks like this.

  Pop         Species        FF1    FF2         FF3 
1  AMGE009A    AMGE    0.3103448 0.02643678 0.00000000  
2  AMGE009B    AMGE    0.1724138 0.02298851 0.00000000  
3  AMGE009C    AMGE    0.1724138 0.13793103 0.06896552  
4  AMGE009D    AMGE    0.2068966 0.06896552 0.00000000  
5  AMGE009E    AMGE    0.2068966 0.10344828 0.00000000  
6  AMGE009F    AMGE    0.2068966 0.03448276 0.00000000  

Where FF1, FF2, and FF3 are three different model outputs.

Here is some code that comes from here and here that is a test example.

First the frequent approach

ngroups <- 5  #number of populations
nsample <- 10  #number of reps in each
pop.means <- c(40, 45, 55, 40, 30)  #population mean length
sigma <- 3  #residual standard deviation
n <- ngroups * nsample  #total sample size
eps <- rnorm(n, 0, sigma)  #residuals
x <- gl(ngroups, nsample, n, lab = LETTERS[1:5])  #factor
means <- rep(pop.means, rep(nsample, ngroups))
X <- model.matrix(~x - 1)  #create a design matrix
y <- as.numeric(X %*% pop.means + eps)
data <- data.frame(y, x)

data.means.lm <- lm(y ~ -1 + x, data)

The anova() function provides the overall test for the factor effect. How can I do this test in a Bayesian Framework? I have seen some people look at the individual parameter estimates and state that, since the CIs do not overlap the overall mean, there are differences. However, I am wondering if there is a way to mimic the anova() function, or is this even advisable?

Here is the Bayesian code to do the above, minus the F ratio test.


    ## MCMC settings
    ni <- 2000
    nt <- 1
    nb <- 1000
    nc <- 4
    modelString = "
      data {
      int<lower=1> n;
      int<lower=1> nX;
      vector [n] y;
      matrix [n,nX] X;
      parameters {
      vector[nX] beta;
      real<lower=0> sigma;
      transformed parameters {
      vector[n] mu;

      mu = X*beta;
      model {
      // Likelihood
      // Priors
      beta ~ normal(0,1000);
      generated quantities {
      vector[n] log_lik;
      for (i in 1:n) {
      log_lik[i] = normal_lpdf(y[i] | mu[i], sigma); 
    Xmat <- model.matrix(~ -1 + x, data)
    data.list <- with(data, list(y = y, X = Xmat, nX = ncol(Xmat), n = nrow(data)))

    data.rstan <- stan(data = data.list, model_code = modelString, chains = nc,
                       iter = ni, warmup = nb, thin = nt)

    print(data.rstan, par = c("beta", "sigma"))
1 Like

I would just look at the 50% uncertainty intervals but that’s coming from a Bayesian perspective. Also the beta prior is pretty wide and should be narrowed down.

1 Like

So when you think about testing it raises a bunch of questions. These are pulled from @martinmodrak and @betanalpha 's discussion here: Hypothesis testing, model selection, model comparison - some thoughts

Short summary:
What is the relevance in testing?
What questions are you asking?
What decisions are you going to be making?
What inference are you looking for?


Thanks @Ara_Winter

You are right that when you talk about testing it raises a bunch of questions. What I am trying to test is not quite straightforward. I actually have 26 species, so in a frequentist world, if one of the 26 is different from the mean, then the factor itself is significant. What I am trying to test is whether the species designation results in different estimates, or rather a different range of estimates, but it is kind of apparent that it does based on my data. In other words, I have a model with a CI of 0.29-0.03, while the other models are more like 0.31-0.29 and 0.12-0.03. So naturally, any sample is going to have a greater variance.

1 Like

So when you plot your data, what does that look like? If you can share it here.

Also a good reference for this kind of work: https://lindeloev.github.io/tests-as-linear/

That’s not mine but it’s how I teach stats these days in workshops and mini-classes.

I do have a graph that I have attached. In the graph, there are three estimates of fire frequency (BA model, and Recorded). What I can see in the graph is the the Recorded estimates are all around .3, and the BA estimates are relatively small (compared to model and Recorded). The “model” fire frequency results show more variation. And this is what I am interested in “testing”. I want to know if there is more variation in the model results relative to the BA and the recorded values. Does this make sense? Thanks again for your help.


Yeah, that makes sense. So when you run you Stan code on the simulated data and on the real data what do you see?

I’d recommend loading in tidybayes and bayesplot to help with the visuals.

thanks, I will check out tidybayes and bayesplot. The figure above is the raw data. Interestingly, a simple lm() is able to capture the parameter estimates for the 26 species, but I am not getting so great of results with Stan. The trace plots look okay, but the estimates are all over the place. Maybe I need longer chains since I have 26 levels?

The beta prior needs to be reined in. And what is your model call for the real data?

Reigned in, I like it. You are right. I was sloppy and used some old code.

Oh also a student_t prior for the levels? I think that’s what all the cool folks are using these days. I’d need to check on that.

So the parameter estimates using Bayes are still not quite as accurate as the lm() results. I have changed the prior to (0,1), and the estimates are in the ball park, but not quite right.

Here are the results. Mean is the mean from the raw data, lm_mean is the mean based on the lm() function in r, and Bayes_mean is what I calculated using the code above, with a prior of ```
beta ~ normal(0,1).

      Species       mean    lm_mean Bayes_mean
    1    AMGE 0.07333456 0.07333456       0.07
    2    ASMI 0.17831642 0.17831642       0.10
    3    NEUM 0.06381715 0.06381715      -0.01
    4    PHSI 0.15278022 0.15278022       0.08
    5    PYBR 0.18095952 0.18095952       0.11
    6    RHMI 0.16036247 0.16036247       0.09
    7    SCAM 0.27946801 0.27946801       0.21
    8    STPI 0.16007100 0.16007100       0.09
9    TRCA 0.14790092 0.14790092       0.07

I will check out the student_t prior. This question has strayed a bit from the original topic, but I appreciate the help. I think one of the ways that I could test whether the models are capturing the difference between species is by comparing the upper X number of species with the lower X number of species. I just need to get the parameter estimates correct in the Bayes model I am using.

1 Like

Once you’ve settled on a model we can look at the 50% uncertainty intervals. Like you said above we check graphically and with the summaries to see if they are overlapping.

For a full discussion of hypothesis testing as decision making see https://arxiv.org/abs/1803.08393.

1 Like

I am wondering if section “15.6 Analysis of variance and the batching of coefficients” in Bayesian Data Analysis, Third Edition (2014) by Andrew Gelman John B. Carlin Hal S. Stern David B. Dunson Aki Vehtari Donald B. Rubin might be of help?
Not sure, but maybe.