Multilevel logistic regression problem

jonsjoberg · November 5, 2017, 4:17pm

I’m trying to fit a three level logistic regression, it’s a problem where I have individual observations of a species and then the hierarchy is made upp of all the observation for that species, and then what family the species belongs to. But I can’t seem to figure out how to code that up in a way that doesn’t make Stan run into problems with maximum tree depth and low BFMI.

First here is some (ugly) code to generate data like the one I’m trying to fit to:

set.seed(10)
h_mu_fam <- rnorm(1)
h_s_fam <- rexp(1, rate = 3)

mu_fam <- rnorm(10, mean = h_mu_fam, h_s_fam)
s_fam <- rexp(1, rate = 3)
s_sp <- rexp(1, rate = 3)

test_df <- data_frame(sp_int = character(),
                  fam_int = numeric(),
                  theta = numeric(),
                  mu_sp = numeric(),
                  mu_fam = numeric())

for (i in 1:length(mu_fam)){
    mu_sp <- rnorm(10, mu_fam[i], s_fam)
    for( j in 1:length(mu_sp)){
        theta <- rnorm(10, mu_sp[j], s_sp)
        tmp <- data_frame(sp_int = i*10 + j,
                  fam_int = i,
                  theta = theta,
                  mu_sp = mu_sp[j],
                  mu_fam = mu_fam[i])
        test_df <- rbind(test_df, tmp)
    }
}
test_df <- test_df %>% mutate(sp_int = as.numeric(as.factor(sp_int)),
                          p = logistic(theta))

test_df$y <- rbernoulli(length(test_df$p), test_df$p)

I want to keep it as bernoulli distributed and not collapse the observations to binomial, because for the final model I will want to include predictors that are specific to each observation.

This is my best attempt at coding this model in Stan:

data {
    int N;
    int N_sp;
    int N_fam;
    int sp[N];  // Index vector for species
    int fam[N]; // Index vector for families
    int y[N];
}
parameters {
    real a; // Global intercept

    vector[N_sp] a_sp;
    real<lower=0> s_sp;

    vector[N_fam] a_fam;
    real<lower=0> s_fam;

}
model {
     y ~ bernoulli_logit(a + a_sp[sp]);
     a ~ normal(0, 10);

    a_sp[sp] ~ normal(a_fam[fam], s_sp);
    s_sp ~ cauchy(0, 1);

    a_fam ~ normal(0, s_fam);
    s_fam ~ cauchy(0, 1);
}

But as I said it doesn’t work, I would be most grateful if someone could spot why it doesn’t work…

Andre_Pfeuffer · November 5, 2017, 5:23pm

In Stan the order of the Statement is important.

 y ~ bernoulli_logit(a + a_sp[sp]);
 a ~ normal(0, 10);

a_sp[sp] ~ normal(a_fam[fam], s_sp);
s_sp ~ cauchy(0, 1);

a_fam ~ normal(0, s_fam);
s_fam ~ cauchy(0, 1);

Do:

 a ~ normal(0, 10);    
 s_sp ~ cauchy(0, 1);
 s_fam ~ cauchy(0, 1);

 y ~ bernoulli_logit(a + a_sp[sp]);
 a_fam ~ normal(0, s_fam);
 a_sp[sp] ~ normal(a_fam[fam], s_sp);

sakrejda · November 5, 2017, 9:06pm

In this particular case the order is not important because none of those LHS things are being assigned to.

Topic		Replies	Views
Question about multilevel logistic model (mixed intercept logistic model) in Stan code Modeling rstan	4	499	June 7, 2021
Posterior predictive checks of a Multilevel regression model Modeling	1	708	November 27, 2017
Multilevel hierarchical linear regression model with nested predictors Modeling	1	1571	January 9, 2018
Issues with hierarchical linear regression stan model Modeling rstan , fitting-issues	2	473	January 26, 2021
Trying to fit my costumised logistic function Modeling specification	30	2425	January 28, 2020

Multilevel logistic regression problem

Related topics