Multilevel logistic regression problem

I’m trying to fit a three level logistic regression, it’s a problem where I have individual observations of a species and then the hierarchy is made upp of all the observation for that species, and then what family the species belongs to. But I can’t seem to figure out how to code that up in a way that doesn’t make Stan run into problems with maximum tree depth and low BFMI.

First here is some (ugly) code to generate data like the one I’m trying to fit to:

set.seed(10)
h_mu_fam <- rnorm(1)
h_s_fam <- rexp(1, rate = 3)

mu_fam <- rnorm(10, mean = h_mu_fam, h_s_fam)
s_fam <- rexp(1, rate = 3)
s_sp <- rexp(1, rate = 3)

test_df <- data_frame(sp_int = character(),
                  fam_int = numeric(),
                  theta = numeric(),
                  mu_sp = numeric(),
                  mu_fam = numeric())

for (i in 1:length(mu_fam)){
    mu_sp <- rnorm(10, mu_fam[i], s_fam)
    for( j in 1:length(mu_sp)){
        theta <- rnorm(10, mu_sp[j], s_sp)
        tmp <- data_frame(sp_int = i*10 + j,
                  fam_int = i,
                  theta = theta,
                  mu_sp = mu_sp[j],
                  mu_fam = mu_fam[i])
        test_df <- rbind(test_df, tmp)
    }
}
test_df <- test_df %>% mutate(sp_int = as.numeric(as.factor(sp_int)),
                          p = logistic(theta))

test_df$y <- rbernoulli(length(test_df$p), test_df$p)

I want to keep it as bernoulli distributed and not collapse the observations to binomial, because for the final model I will want to include predictors that are specific to each observation.

This is my best attempt at coding this model in Stan:

data {
    int N;
    int N_sp;
    int N_fam;
    int sp[N];  // Index vector for species
    int fam[N]; // Index vector for families
    int y[N];
}
parameters {
    real a; // Global intercept

    vector[N_sp] a_sp;
    real<lower=0> s_sp;

    vector[N_fam] a_fam;
    real<lower=0> s_fam;

}
model {
     y ~ bernoulli_logit(a + a_sp[sp]);
     a ~ normal(0, 10);

    a_sp[sp] ~ normal(a_fam[fam], s_sp);
    s_sp ~ cauchy(0, 1);

    a_fam ~ normal(0, s_fam);
    s_fam ~ cauchy(0, 1);
}

But as I said it doesn’t work, I would be most grateful if someone could spot why it doesn’t work…

In Stan the order of the Statement is important.

 y ~ bernoulli_logit(a + a_sp[sp]);
 a ~ normal(0, 10);

a_sp[sp] ~ normal(a_fam[fam], s_sp);
s_sp ~ cauchy(0, 1);

a_fam ~ normal(0, s_fam);
s_fam ~ cauchy(0, 1);

Do:

 a ~ normal(0, 10);    
 s_sp ~ cauchy(0, 1);
 s_fam ~ cauchy(0, 1);

 y ~ bernoulli_logit(a + a_sp[sp]);
 a_fam ~ normal(0, s_fam);
 a_sp[sp] ~ normal(a_fam[fam], s_sp);

In this particular case the order is not important because none of those LHS things are being assigned to.