Hello,

I am trying to analyze a dataset with predictors collected at two hierarchical levels. The higher level is determined by a plot geographical location and the lower level is determined by environmental heterogeneity within a plot so that the response variable is measured for each unique environmental condition within a plot. In most cases, plots were homogeneous and thus are represented by a single data point, whereas about 30% of plots contained 2-4 conditions and thus are represented by 2-4 data points.

I’ve been trying to fit a binomial regression with these data but I run into a problem that posterior distributions of several predictors are highly correlated. I suspect that the problem might be a small number of observations per group (i.e. number conditions per plot), though I am not completely sure. I get the same problem with simulated data and I can’t recover the parameters:

a <- rnorm(1000, 1, 3)

b <- rnorm(1000, 7, 5)

c <- rnorm( 1000, 0, 2)

d <- rnorm( 1300, 0, 1)

group <- sample(1:1000, 1300, replace = TRUE)

data <- data.frame(a = a, b = b, c = c, group = 1:1000) %>% right_join(data.frame(d = d, group = group))

x1 <- 7

x2 <- -2

data$y <- 5 + x1*data$a + x2*data$b + data$c + data$d

data$pr <- round( exp(data$y) / (1 + exp(data$y)), 4)

data$y <- rbinom(1300, 1, data$pr)

data$obs <- 1:1300

data[, c(“a”, “b”)] <- data[, c(“a”, “b”)] %>% mutate_all(funs(as.numeric(scale(.))))

m.sim <- brm(y ~ a + b + (1|group) + (1|obs), data = data, family = bernoulli(),

prior = c(prior(“normal(0,5)”, class = “b”),

prior(“normal(0,5)”, class = “sd”)),

chains = 1, cores = 1, thin = 20, iter = 10000, warmup = 2000,

control = list(adapt_delta = .9, max_treedepth = 12))

pairs(m.sim)

Pairs.pdf (172.7 KB)

print(m.sim)

Family: bernoulli

Links: mu = logit

Formula: y ~ a + b + (1 | group) + (1 | obs)

Data: data (Number of observations: 1300)

Samples: 1 chains, each with iter = 10000; warmup = 2000; thin = 20;

total post-warmup samples = 400

Group-Level Effects:

~group (Number of levels: 719)

Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat

sd(Intercept) 0.53 0.37 0.05 1.33 223 1.00

~obs (Number of levels: 1300)

Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat

sd(Intercept) 0.64 0.45 0.02 1.65 400 1.00

Population-Level Effects:

Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat

Intercept -1.99 0.33 -2.63 -1.39 400 1.00

a 14.92 1.83 11.78 18.77 400 1.00

b -7.03 0.88 -8.99 -5.49 400 1.00

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample

is a crude measure of effective sample size, and Rhat is the potential

scale reduction factor on split chains (at convergence, Rhat = 1).

I’d really appreciate any advice on how to proceed with this problem.

Marina