Confirmation of multilevel, multivariate analysis


#1

Hello all,

Firstly, a big thanks to the developers of brms which seems to be an excellent package.

I have collected data according to the following experimental design: 3 experimental blocks, within which are 3 sites that each have a different plant treatment imposed (low/med/high). I have measured the plant cover in 25 quadrats per site, and a single measure of fish abundance at each site (I have measured many other variables but exclude them in this example for simplicity). I view this as a multilevel, multivariate problem because I want to estimate the plant cover from the 25 quadrats, and its effect on fish at the site-level.

I am hoping to use brms to estimate these effects simultaneously. I attach some code simulating data representing my problem and output suggesting it works (see below code). My concern is that to use brms I have had to replicate the site-level fish data 25 times (i.e. one measure for each quadrat).

Is there any reason why this is not the best approach?

If it is a suitable approach, is there any reason why it can’t be generalised to a case where the site-level variable is unbalanced? i.e. twenty measures on one site and two measures on another site, each of which are associated with 25 measures of plant.

Many thanks for any help with this query.

Best,

Jess Marsh

  • Operating System: Windows 10
  • brms Version: 2.1.0

# libraries

library(brms)
library(lubridate)
library(ggplot2)
th <- theme(axis.text = element_text(size = 16),
                    axis.title = element_text(size = 16),
                    legend.text = element_text(size = 16),
                    legend.title = element_text(size = 16),
                    complete = FALSE)

# simulate data ---------------------------------------------------------------

# experimental design

n.blocks <- 3
n.treats <- 3
n.quads <- 25
n.fishing <- n.blocks * n.treats
n.samples <- n.blocks * n.treats * n.quads

# create categorical variables

blocks <- factor(1:n.blocks, labels = as.character(c('lower', 'middle', 'upper')))
treats <- factor(1:n.treats, labels = as.character(c('low', 'med', 'high')))
quadrats <- factor(1:n.quads, labels = as.character(1:n.quads))

# choose plant effects

plant_alpha <- c('alpha' = 0.1) # Intercept
plant_Treat <- matrix(c(0, 0.3, 0.75), ncol = 1, dimnames = list(treats))[, 1]
plant_Block <- matrix(c(0, 0.05, 0.07), ncol = 1, dimnames = list(blocks))[, 1]
plant_TreatBlock <- matrix(c(0, 0, 0, 0, -0.05, -0.1, 0, -0.25, -0.20), 
                            nrow = n.treats, ncol = n.blocks, byrow = TRUE,
                            dimnames = list(treats, blocks))
plant_sigma <- c('sigma' = 0.2)
plant_sigma2 <- c('sigma2' = plant_sigma^2)
plant_tau <- c('tau' = (1 / plant_sigma)^2)
plant_epsilon <- rnorm(n = n.samples, mean = 0, sd = plant_sigma2)

# choose fish effects

fish_alpha <- c('alpha' = 0.15) # Intercept
fish_Block <- matrix(c(0, -0.13,-0.11), ncol = 1, dimnames = list(blocks))[, 1]
fish_plant <- c('plant' = 0.04)
fish_sigma <- c('sigma' = 0.06)
fish_sigma2 <- c('sigma2' = fish_sigma^2)
fish_tau <- c('tau' = (1 / fish_sigma)^2)
fish_epsilon <- rep(rnorm(n = n.fishing, mean = 0, sd = fish_sigma2), each = n.quads)

# create raw data and reorder

dat <- expand.grid('block' = blocks, 'treat' = treats, 'quadrat' = quadrats)
dat <- dat[with(dat, order(block, treat, quadrat)), ]

# create plant explanatory variable and add to dat

plant_pred <- vector('numeric', length = n.samples)
for(i in 1:n.samples){
  plant_pred[i] <- 
    plant_alpha + 
    plant_Treat[dat$treat[i]] + 
    plant_Block[dat$block[i]] + 
    plant_TreatBlock[dat$treat[i], dat$block[i]] +
    plant_epsilon[i]
}
dat$plant <- plant_pred

# set negative plant values to 0

dat$plant[which(dat$plant < 0)] <- 0

# create response variable and add to dat

foo <- unique(dat[, c('block', 'treat')])
fish_pred <- vector('numeric', length = n.fishing)
plant_mu <- vector('numeric', length = n.fishing)
for(j in 1:n.fishing){
  fii <- with(foo[j, ], 
              dat[which(dat$block == block & dat$treat== treat), ])
  plant_mu[j] <- mean(fii$plant)
  fish_pred[j] <- 
    fish_alpha + 
    fish_Block[foo$block[j]] + 
    (fish_plant * plant_mu[j]) + 
    fish_epsilon[i]
}
dat$fish <- rep(fish_pred, each = n.quads)

# plots ---------------------------------------------------------------

# plot plant

p_plant <- ggplot(dat, aes(x = treat, y = plant)) + 
  geom_boxplot() + 
  facet_wrap(~ block, ncol = n.blocks) + 
  xlab('treatment') + ylab('plant') + 
  th
print(p_plant)

# plot fish

p_fish <- ggplot(dat, aes(x = treat, y = fish)) + 
  geom_point() + 
  facet_wrap(~ block, ncol = n.blocks) + 
  xlab('treatment') + ylab('fish') + 
  th
print(p_fish)

# brms --------------------------------------------------------------------

# model

brf <-
  bf(fish ~ 1 + block + plant) +
  bf(plant ~ 1 + treat + block + treat:block)

# fit

brms_fit <- brm(brf + set_rescor(FALSE), data = dat, chains = 3, iter = 20000, thin = 100)

# print

print(brms_fit)


#2

There is too much code and your first question is too vague to be answered reasonably, at least for me. In the area of statistical modeling of actual data, there is no such thing as the (generally) “best” approach.

I don’t immediately see a reason, why this couldn’t be generalized to unbalanced site-level data.