Hi all,
This is perhaps a stupid question, so forgive me as my experience with Bayesian models is less than frequentist models.
I have a dataset that has roughly 10 species and X number of populations for each species. Each individual population has a fire frequency (proportional) estimate. I am interested in knowing if species designation is significant. I am currently not concerned about individual species estimates. The dataset looks like this.
Pop Species FF1 FF2 FF3
1 AMGE009A AMGE 0.3103448 0.02643678 0.00000000
2 AMGE009B AMGE 0.1724138 0.02298851 0.00000000
3 AMGE009C AMGE 0.1724138 0.13793103 0.06896552
4 AMGE009D AMGE 0.2068966 0.06896552 0.00000000
5 AMGE009E AMGE 0.2068966 0.10344828 0.00000000
6 AMGE009F AMGE 0.2068966 0.03448276 0.00000000
Where FF1, FF2, and FF3 are three different model outputs.
Here is some code that comes from here and here that is a test example.
First the frequent approach
set.seed(1)
ngroups <- 5 #number of populations
nsample <- 10 #number of reps in each
pop.means <- c(40, 45, 55, 40, 30) #population mean length
sigma <- 3 #residual standard deviation
n <- ngroups * nsample #total sample size
eps <- rnorm(n, 0, sigma) #residuals
x <- gl(ngroups, nsample, n, lab = LETTERS[1:5]) #factor
means <- rep(pop.means, rep(nsample, ngroups))
X <- model.matrix(~x - 1) #create a design matrix
y <- as.numeric(X %*% pop.means + eps)
data <- data.frame(y, x)
data.means.lm <- lm(y ~ -1 + x, data)
anova(data.means.lm)
The anova() function provides the overall test for the factor effect. How can I do this test in a Bayesian Framework? I have seen some people look at the individual parameter estimates and state that, since the CIs do not overlap the overall mean, there are differences. However, I am wondering if there is a way to mimic the anova() function, or is this even advisable?
Here is the Bayesian code to do the above, minus the F ratio test.
library(rstan)
## MCMC settings
ni <- 2000
nt <- 1
nb <- 1000
nc <- 4
modelString = "
data {
int<lower=1> n;
int<lower=1> nX;
vector [n] y;
matrix [n,nX] X;
}
parameters {
vector[nX] beta;
real<lower=0> sigma;
}
transformed parameters {
vector[n] mu;
mu = X*beta;
}
model {
// Likelihood
y~normal(mu,sigma);
// Priors
beta ~ normal(0,1000);
sigma~cauchy(0,5);
}
generated quantities {
vector[n] log_lik;
for (i in 1:n) {
log_lik[i] = normal_lpdf(y[i] | mu[i], sigma);
}
}
"
Xmat <- model.matrix(~ -1 + x, data)
data.list <- with(data, list(y = y, X = Xmat, nX = ncol(Xmat), n = nrow(data)))
data.rstan <- stan(data = data.list, model_code = modelString, chains = nc,
iter = ni, warmup = nb, thin = nt)
print(data.rstan, par = c("beta", "sigma"))