# Non-Centered Hierarchical Model with Multiple Nestings

I am attempting to estimate a non-centered model with multiple levels of hierarchy, i.e. I have collected observations of companies over time, companies are nested within states, states are related to each other. My outcomes is binomial and can be thought of as the number of sales out of how many possible clients received a quote. I am trying for a model with an intercept and a coefficient on a (linear) time term. When I use my (partially) centered model below, I get divergences. How do I get a fully non-centered parameterization with multiple layers in my structure?

To try to ensure that this model works as I believe it should, I am using simulated data (code below to create the data).

Thanks!

``````set.seed(1234)

(state.a <- rnorm(3, 0, 1)) # state-level intercepts
(state.b <- rnorm(3, 0, 1)) # state-level coefficients on time

means <- c(1000, 1000, 1000, 1000, 1000, 1000)
# on average how many quotes are there in each state-company
state.means <- cbind.data.frame(a=rep(state.a, each=2)
, b=rep(state.b, each=2)
, states = rep(1:3, each=2))
data <- data.frame()

for (i in 1:6){  # for each state-company combo

a <- rnorm(1, state.means[i,1], 0.05) # a company-specific intercept
b <- rnorm(1, state.means[i,2], 0.05) # a company-specific coefficient on time
n <- round(rnorm(length(times), means[i], 5)) # the number of quotes in each time period
out <- rep(NA, length(times))

for (t in 1:length(times)){

out[t] <- rbinom(1, n[t], plogis(a + times[t] * b))
# simulate the probability of success at a given point in time, given the company-level intercept and slope

}

out <- cbind.data.frame(y = out
, np = n
, a = a
, state_a = state.means[i,1]
, b = b
, state_b = state.means[i,2]
, id = i
, states = state.means\$states[i]
, times = times)

data <- rbind(data, out)

}

stan.data <- list(N = nrow(data)
, L = length(unique(data\$id))
, y = data\$y
, ll = data\$id
, S = 3
, ss = state.means\$states
, K = data\$np
, times = data\$times)

fit <- stan(file = "stan forum.stan"
, data = stan.data
, iter = 2000
, chains = 3
, control = list(max_treedepth = 25
``````

``````data {
int<lower=0> N;           // number of observations
int<lower=0> K[N];        // number of trials within a state-company group
int<lower=0> y[N];        // number of "successes" within a state-company group
int<lower=1> L;              // number of state-company combinations
int<lower=1, upper=L> ll[N]; // mapping the state-companies to individual observations
int<lower=1> S;              // number of states included
int<lower=1, upper=S> ss[L]; // mapping the states to companies
real<lower=0, upper=1> times[N]; // a variable representing time
}
parameters {
real<lower=0> sigma;           // population sd of p(success)
vector[L] alpha_std;           // used in centering

vector[S] intercept_state;               // state-level intercepts
real<lower=0> intercept_sigma_state; // state-level sd of intercepts
vector[S] beta_t_state;                   // state-level coefficient on time
real<lower=0> beta_t_sigma_state; // state-level sd of coefficient on time

vector[L] intercept; // the resulting company-level intercept
vector[L] beta_t;    // the resulting company-level coefficient on time
}
transformed parameters {
vector[N] mu;
// mean of success for every observation
//(impacted by state- and company-level factors)

for (n in 1:N)
mu[n] = intercept[ll[n]] + beta_t[ll[n]] * times[n];

}
model {
intercept_state ~ normal(0, 1);         // hyperprior
beta_t_state ~ normal(0, 1);

sigma ~ normal(0.25, 0.1);                // hyperprior
alpha_std ~ normal(0, 1);                 // prior (hierarchical)

for (l in 1:L){
intercept[l] ~ normal(intercept_state[ss[l]], intercept_sigma_state);
beta_t[l] ~ normal(beta_t_state[ss[l]], beta_t_sigma_state);
}

{
vector[N] alpha_std_ll;

for (n in 1:N)
alpha_std_ll[n] = alpha_std[ll[n]];

y ~ binomial_logit(K, mu + sigma * alpha_std_ll);  // likelihood
}
}
``````

I had some trouble with a similar model, but did eventually get it to sample, and that was with missing data thrown in as well.

Looking over your model, you might want to non-center `intercept[l]` and `beta_t[l]`? For instance,

``````intercept[l] = intercept_state[ss[l]] + intercept_sigma_state * intercept_tilde[l];
``````

with `intercept_tilde[l]` being standard normal.

Update: I realized I should have put the information you suggested in the transformed parameters block and now the model will run again. Unfortunately I still have divergences. I have been playing with different step sizes and adapt_delta to try to get around this. Any other ideas from anyone?

``````// Pulling in adaptations from here https://jrnold.github.io/bayesian_notes/binomial-models.html

data {
int<lower=0> N;           // number of observations
int<lower=0> K[N];        // number of trials within a state-company group
int<lower=0> y[N];        // number of "successes" within a state-company group
int<lower=1> L;              // number of state-company combinations
int<lower=1, upper=L> ll[N]; // mapping the state-companies to individual observations
int<lower=1> S;              // number of states included
int<lower=1, upper=S> ss[L]; // mapping the states to companies
real<lower=0, upper=1> times[N]; // a variable representing time
}
parameters {
real<lower=0> sigma;           // population sd of p(success)
vector[L] alpha_std;           // used in centering
vector[L] intercept_tilde;
vector[L] beta_tilde;

vector[S] intercept_state;               // state-level intercepts
real<lower=0> intercept_sigma_state; // state-level sd of intercepts
vector[S] beta_t_state;                   // state-level coefficient on time
real<lower=0> beta_t_sigma_state; // state-level sd of coefficient on time

}
transformed parameters {

vector[L] intercept; // the resulting company-level intercept
vector[L] beta_t;    // the resulting company-level coefficient on time

vector[N] mu;
// mean of success for every observation
//(impacted by state- and company-level factors)

for (l in 1:L){
intercept[l] = intercept_state[ss[l]] + intercept_sigma_state * intercept_tilde[l];
beta_t[l] = beta_t_state[ss[l]] + beta_t_sigma_state * beta_tilde[l];
}

for (n in 1:N)
mu[n] = intercept[ll[n]] + beta_t[ll[n]] * times[n];

}
model {
intercept_state ~ normal(0, 1);         // hyperprior
beta_t_state ~ normal(0, 1);

sigma ~ normal(0.25, 0.1);                // hyperprior
alpha_std ~ normal(0, 1);                 // prior (hierarchical)
intercept_tilde ~ normal(0, 1);
beta_tilde ~ normal(0, 1);

{
vector[N] alpha_std_ll;

for (n in 1:N)
alpha_std_ll[n] = alpha_std[ll[n]];

y ~ binomial_logit(K, mu + sigma * alpha_std_ll);  // likelihood
}
}
``````

Counting seems to indicate you are missing priors on some variables - I believe the two `sigma_state` variables could do with a prior, especially since they control scales.
If that does not work you might need to tweak your priors and/or your `adapt_delta`. If you get divergences again, report the error message here.