# Specifying varying slopes across different levels of hierarchical model

Hi all,

I’m having trouble coding varying slopes across different levels in a hierarchical model. I would like to vary slopes at a multiple levels of analysis based on how observations at one level are grouped.

I would appreciate input on 1: how to write the model down correctly, and 2: how to code it in STAN.

As an example of what I’m trying to do: consider the radon example and imagine individuals are part of another group besides counties. In that case, I would let the coefficients on county-level variables vary according to individual-level membership in that other group. Being in the group changes the impact of factors at a different level of analysis.

Specifically, I’m interested in how participation in international alliances alters the military spending of member states. I expect particular alliance characteristics will impact military spending, and each alliance treaty will have a different effect on military spending.

States are members of multiple alliances, and membership is not nested. There are 280 alliances in the data. In the model, I have both state and alliance-level variables, along with state and year varying intercepts.

The mean of the outcome is:
\mu = \alpha + \alpha^{st} + \alpha^{yr} +\textbf{W}_{n \times k} \gamma_{k \times 1} + \textbf{Z}_{n \times a} \lambda_{a \times 1}

Then the mean of each alliance-specific parameter \lambda is a function of alliance level variables, where
\lambda \sim N(X \beta, \sigma_all)

Z is a matrix of state membership in alliances- this is how I capture membership in multiple treaties. Columns are alliances, and rows are state-year observations. If a state is not in an alliance, the matrix element is zero. If a state is in an alliance, the corresponding element is the rescaled value of allied military spending.

States can be divided into J groups. I’ve check this by estimating the above model on J samples, but would prefer to let at least the \lambda and \beta vary by group. Getting \gamma to vary by a state-year-level group is straightforward- I did that for practice.

I think the model can be re-written like this:

\mu = \alpha + \alpha^{st} + \alpha^{yr} +\textbf{W} \gamma_{j} + \textbf{Z} \lambda_{j}

\lambda_j \sim MVN(X \beta_j, \Sigma_{all}) \mbox{for } j \in 1:J

\beta_j \sim MVN(\mu_{\beta}, \Sigma_{\beta}) \mbox{for } j \in 1:J

But the indexing across levels is a little weird- j is a group at the state-year level. The unit of analysis for the \lambda regression is alliances. Moreover, alliance membership is uneven across groups- some treaties involve only members of one group.

The code I have so far is below- it uses Cholesky factorization for the varying slopes of \lambda and \beta. Right now, this code generates an error: Rejecting initial value: Log probability evaluates to log(0), i.e. negative infinity.

I think the error is because I have not specified the model correctly. I’ve built things piece by piece to this point- fitting the model without varying slopes, or with varying \gamma parameters has no problems.

data {
int<lower = 1> N; // Number of observations
int<lower = 1> S; // number of states
int<lower = 1> T; // number of years
int<lower = 1> J; // number of state capability groups
int<lower = 1> A; // number of alliances
int<lower = 1> L; // number of alliance-level variables
int<lower = 1> M; // number of state-level variables
int<lower = 1, upper = S> state[N]; // state indicator
int<lower = 1, upper = T> year[N]; // year indicator
int<lower = 1, upper = J> cap[N]; // major-power indicator
matrix[A, L] X; // matrix of alliance-level variables
matrix[N, M] W; // matrix of state-level variables
matrix[N, A] Z; // matrix of state membership in alliances
vector[N] y; // outcome
}

parameters {
real alpha; // overall intercept
real<lower = 0> sigma; // variance of outcome
vector[S] alpha_state_std; // better behaved distribution of state intercepts
vector[T] alpha_year_std; // better behaved distribution of year intercepts
real<lower = 0> sigma_state; // variance hyperparameter of the state intercepts
real<lower = 0> sigma_year; // variance hyperparameter of the year intercepts
real<lower = 0> sigma_all; // variance hyperparameter of the alliances
vector[M] gamma; // vector of state-level variables (slopes do not vary by group)
matrix[J, L] mu_beta; // mean of alliance-level coefficients
vector<lower = 0>[L] tau_beta; // mean of theta par in multivariate distribution
matrix[L, J] z_beta; // for non-centered Cholesky factorization
cholesky_factor_corr[L] L_Omega_beta; // for non-centered Cholesky factorization
vector<lower = 0>[A] tau_lambda; // mean of theta par in multivariate distribution
matrix[A, J] z_lambda; // for non-centered Cholesky factorization
cholesky_factor_corr[A] L_Omega_lambda; // for non-centered Cholesky factorization
real<lower = 3> nu; // degrees of freedom in t-distribution of outcome

}

transformed parameters {
vector[S] alpha_state; // state intercepts
vector[T] alpha_year; // year intercepts
matrix[J, L] beta; // mean of alliance parameters- transposed in lambdas
matrix[J, A] lambda; // alliance parameters
vector[N] y_hat; // linear prediction of the outcome mean

alpha_state = 0 + sigma_state * alpha_state_std; // non-centered parameterization, where alpha_state ~ N(0, sigma_state)

alpha_year = 0 + sigma_year * alpha_year_std; // non-centered parameterization, where alpha_state ~ N(0, sigma_state)

// varying slopes in alliance-level regression parameters beta
beta = mu_beta + (diag_pre_multiply(tau_beta, L_Omega_beta) * z_beta)';

// Varying slopes in alliance parameters lambda

// Multivariate distribution of lambdas by capability expressed w/ Cholesky factorization
lambda = (X * beta')' + (diag_pre_multiply(tau_lambda, L_Omega_lambda) * z_lambda)';

// Linear prediction of the state-year spending.
y_hat = alpha + alpha_state[state] + alpha_year[year] + W * gamma + rows_dot_product(lambda[cap], Z) ;

}

model {

alpha ~ normal(0, 1);
sigma ~ normal(0, 1);
alpha_year_std ~ normal(0, 1);
alpha_state_std ~ normal(0, 1);
sigma_state ~ normal(0, 1);
sigma_year ~ normal(0, 1);
sigma_all ~ normal(0, 1);
nu ~ gamma(2, 0.1); // Prior for degrees of freedom in t-dist
gamma ~ normal(0, 1);
to_vector(z_beta) ~ normal(0, 1);
L_Omega_beta ~ lkj_corr_cholesky(2);
tau_beta ~ normal(0, 1);
to_vector(mu_beta) ~ normal(0, 1);
to_vector(z_lambda) ~ normal(0, 1);
L_Omega_lambda ~ lkj_corr_cholesky(2);
tau_lambda ~ normal(0, 1);

y ~ student_t(nu, y_hat, sigma);
}

generated quantities {
vector[N] y_pred; //  posterior predictive distribution

for(i in 1:N)
y_pred[i] = student_t_rng(nu, y_hat[i], sigma);

}

Thanks in advance for any help!

If anyone is curious- here is the associated Github repo.

2 Likes

Realized why the log probability was evaluating to log(0).

Because some alliances only have members from one group, some \lambda_j parameters have no variation in membership- all the relevant elements of the membership matrix Z are 0.

Group 1 has at least one participant in 192/284 treaties. Group 2 has at least one participant in 146/292. There’s some overlap between the groups, but also many treaties that only have members from one group.

Dear @jkalley,

thanks for posting the interesting multi-level model.

I have a similar model, however, I would like to have a ‘non-centered’ multivariate student-t distribution (or any analogous heavier-tailed distribution) at the highest level of my model (at line: β_j∼MVN(μ_β,Σ_β) for j∈1:J).

Do you by any chance know how to implement that?

Best

Not off the top of my head. The tricky thing is dealing with \nu, of course.

See this earlier thread, which has a good starting point. Unfortunately the Google groups link is broken. If you can’t sort it out from the information there, I’d suggest creating a new topic- your request is pretty different from what I asked about here.

How to deal with \nu is exactly where I got stuck.

Your are right, my question is completely different from yours so I will create a new topic.

Thanks!