I have been working on a joint model, the first portion of which is a simple intercept only model. While I have resolved many of the issues I encountered with the model, as discussed in my two previous Stan discourse posts here and here, the intercept only model still struggles to estimates both effects.
Focusing on only the trait model, I consistently find that the model struggles to estimate either the species or study level effect. In the test data I am using a balanced design and initially had 20 species and studies, leading to slightly offset study level estimates. This is illustrated below in the plot comparing the species and study level simulated values (x-axis) versus the values for each estimated by our model (y-axis):
In increasing the number of species and studies to 50, I still found issues in our estimates, but now in the species level effects:
The model with the higher number of studies and species produce poorly mixing chains, but overall model estimates compared to the parameters used in the test data are slightly off, but within the 95% UI:
Parameter param esti X2.5 X97.5
1 mu_grand 15 16.173729 13.853826 17.994575
2 sigma_sp 5 5.587338 4.533229 7.033461
3 sigma_study 5 4.895984 4.033120 5.953887
4 sigmaTrait_y 2 1.989304 1.971891 2.007478
It is unclear to me why such a seemingly simple model would not be able to estimate the two intercepts and produce such poorly mixing chains. Any insights into what might be causing this, or if possible, how to correct it would be much appreciated.
The code I used to create the test data is here:
Nrep <- 10 # rep per trait
Nstudy <- 50 # number of studies w/ traits
Nspp <- 50 # number of species
Ntrt <- Nspp * Nstudy * Nrep
mu.grand <- 15
sigma.species <- 5
sigma.study <- 5
trt.dat <- data.frame(matrix(NA, Ntrt, 1))
names(trt.dat) <- c("rep")
trt.dat$rep <- c(1:Nrep)
trt.dat$study <- rep(rep(c(1:Nstudy), each = Nspp), each = Nrep)
trt.dat$species <- rep(rep(c(1:Nspp), times = Nstudy), each = Nrep)
alphaTraitSp <- rnorm(Nspp, 0, sigma.species)
trt.dat$alphaTraitSp <- rep(rep(alphaTraitSp, times = Nstudy), each = Nrep)
muStudy <- rnorm(Nstudy, 0, sigma.study)
#trt.dat$muStudy <- rep(rep(muStudy, times = Nspp), each = Nstudy)
trt.dat$muStudy <- rep(muStudy, each = Nrep*Nspp)
sigmaTraity <- 2
trt.dat$sigmaTraity <- rnorm(Ntrt, 0, sigmaTraity)
for (i in 1:Ntrt){
trt.dat$mu_grand_sp[i] <- trt.dat$alphaTraitSp[i] + mu.grand
}
for (i in 1:Ntrt){
trt.dat$yTraiti[i] <- trt.dat$alphaTraitSp[i] + trt.dat$muStudy[i] + mu.grand + trt.dat$sigmaTraity[i]
}
sim.data <- list(yTraiti = trt.dat$yTraiti,
N = Ntrt,
n_spec = Nspp,
trait_species = as.numeric(as.factor(trt.dat$species)),
n_study = Nstudy,
study = as.numeric(as.factor(trt.dat$study ))
)
The Stan code for the simple intercept only model is here:
data {
int<lower = 1> n_spec; // number of random effect levels
// Traits
int<lower = 1> N; // Sample size
int<lower = 1, upper = n_spec> trait_species[N]; // id of random effect (species)
int<lower = 1> n_study; // number of random effect levels (study)
int<lower = 1, upper = n_study> study[N]; // id of random effect (study)
vector[N] yTraiti; // Observed trait
}
parameters{
// Traits
real mu_grand; // grand mean for trait value
vector[n_spec] muSp; // species offsets
vector[n_study] muStudy; // study offsets
real<lower = 0> sigma_traity; // sd general
real<lower = 0> sigma_sp; // sd species
real<lower = 0> sigma_study; // sd study
}
transformed parameters{
vector[N] y_hat;
vector[n_spec] mu_grand_sp;
for(i in 1:n_spec){
mu_grand_sp[i] = mu_grand + muSp[i];
}
for (i in 1:N){
y_hat[i] = mu_grand + muSp[trait_species[i]] + muStudy[study[i]];
}
}
model{
yTraiti ~ normal(y_hat, sigma_traity);
muSp ~ normal(0, sigma_sp);
muStudy ~ normal(0, sigma_study);
//// priors
mu_grand ~ normal(20, 10);
sigma_sp ~ normal(4, 5);
sigma_study ~ normal(2, 5);
sigma_traity ~ normal(3, 5);
}
Thanks!