Specifying varying slopes across different levels of hierarchical model

Hi all,

I’m having trouble coding varying slopes across different levels in a hierarchical model. I would like to vary slopes at a multiple levels of analysis based on how observations at one level are grouped.

I would appreciate input on 1: how to write the model down correctly, and 2: how to code it in STAN.

As an example of what I’m trying to do: consider the radon example and imagine individuals are part of another group besides counties. In that case, I would let the coefficients on county-level variables vary according to individual-level membership in that other group. Being in the group changes the impact of factors at a different level of analysis.

Specifically, I’m interested in how participation in international alliances alters the military spending of member states. I expect particular alliance characteristics will impact military spending, and each alliance treaty will have a different effect on military spending.

States are members of multiple alliances, and membership is not nested. There are 280 alliances in the data. In the model, I have both state and alliance-level variables, along with state and year varying intercepts.

The mean of the outcome is:
\mu = \alpha + \alpha^{st} + \alpha^{yr} +\textbf{W}_{n \times k} \gamma_{k \times 1} + \textbf{Z}_{n \times a} \lambda_{a \times 1}

Then the mean of each alliance-specific parameter \lambda is a function of alliance level variables, where
\lambda \sim N(X \beta, \sigma_all)

Z is a matrix of state membership in alliances- this is how I capture membership in multiple treaties. Columns are alliances, and rows are state-year observations. If a state is not in an alliance, the matrix element is zero. If a state is in an alliance, the corresponding element is the rescaled value of allied military spending.

States can be divided into J groups. I’ve check this by estimating the above model on J samples, but would prefer to let at least the \lambda and \beta vary by group. Getting \gamma to vary by a state-year-level group is straightforward- I did that for practice.

I think the model can be re-written like this:

\mu = \alpha + \alpha^{st} + \alpha^{yr} +\textbf{W} \gamma_{j} + \textbf{Z} \lambda_{j}

\lambda_j \sim MVN(X \beta_j, \Sigma_{all}) \mbox{for } j \in 1:J

\beta_j \sim MVN(\mu_{\beta}, \Sigma_{\beta}) \mbox{for } j \in 1:J

But the indexing across levels is a little weird- j is a group at the state-year level. The unit of analysis for the \lambda regression is alliances. Moreover, alliance membership is uneven across groups- some treaties involve only members of one group.

The code I have so far is below- it uses Cholesky factorization for the varying slopes of \lambda and \beta. Right now, this code generates an error: Rejecting initial value: Log probability evaluates to log(0), i.e. negative infinity.

I think the error is because I have not specified the model correctly. I’ve built things piece by piece to this point- fitting the model without varying slopes, or with varying \gamma parameters has no problems.

data {
  int<lower = 1> N; // Number of observations
  int<lower = 1> S; // number of states
  int<lower = 1> T; // number of years
  int<lower = 1> J; // number of state capability groups
  int<lower = 1> A; // number of alliances
  int<lower = 1> L; // number of alliance-level variables
  int<lower = 1> M; // number of state-level variables
  int<lower = 1, upper = S> state[N]; // state indicator
  int<lower = 1, upper = T> year[N]; // year indicator
  int<lower = 1, upper = J> cap[N]; // major-power indicator
  matrix[A, L] X; // matrix of alliance-level variables
  matrix[N, M] W; // matrix of state-level variables
  matrix[N, A] Z; // matrix of state membership in alliances
  vector[N] y; // outcome 

parameters {
  real alpha; // overall intercept
  real<lower = 0> sigma; // variance of outcome
  vector[S] alpha_state_std; // better behaved distribution of state intercepts
  vector[T] alpha_year_std; // better behaved distribution of year intercepts
  real<lower = 0> sigma_state; // variance hyperparameter of the state intercepts
  real<lower = 0> sigma_year; // variance hyperparameter of the year intercepts
  real<lower = 0> sigma_all; // variance hyperparameter of the alliances
  vector[M] gamma; // vector of state-level variables (slopes do not vary by group)
  matrix[J, L] mu_beta; // mean of alliance-level coefficients
  vector<lower = 0>[L] tau_beta; // mean of theta par in multivariate distribution 
  matrix[L, J] z_beta; // for non-centered Cholesky factorization 
  cholesky_factor_corr[L] L_Omega_beta; // for non-centered Cholesky factorization 
  vector<lower = 0>[A] tau_lambda; // mean of theta par in multivariate distribution 
  matrix[A, J] z_lambda; // for non-centered Cholesky factorization 
  cholesky_factor_corr[A] L_Omega_lambda; // for non-centered Cholesky factorization 
  real<lower = 3> nu; // degrees of freedom in t-distribution of outcome


transformed parameters {
  vector[S] alpha_state; // state intercepts
  vector[T] alpha_year; // year intercepts
  matrix[J, L] beta; // mean of alliance parameters- transposed in lambdas
  matrix[J, A] lambda; // alliance parameters
  vector[N] y_hat; // linear prediction of the outcome mean

 alpha_state = 0 + sigma_state * alpha_state_std; // non-centered parameterization, where alpha_state ~ N(0, sigma_state)

alpha_year = 0 + sigma_year * alpha_year_std; // non-centered parameterization, where alpha_state ~ N(0, sigma_state)

// varying slopes in alliance-level regression parameters beta 
  beta = mu_beta + (diag_pre_multiply(tau_beta, L_Omega_beta) * z_beta)';

// Varying slopes in alliance parameters lambda 

// Multivariate distribution of lambdas by capability expressed w/ Cholesky factorization 
  lambda = (X * beta')' + (diag_pre_multiply(tau_lambda, L_Omega_lambda) * z_lambda)';

// Linear prediction of the state-year spending. 
    y_hat = alpha + alpha_state[state] + alpha_year[year] + W * gamma + rows_dot_product(lambda[cap], Z) ;

model {
  alpha ~ normal(0, 1);
  sigma ~ normal(0, 1);
  alpha_year_std ~ normal(0, 1);
  alpha_state_std ~ normal(0, 1); 
  sigma_state ~ normal(0, 1);
  sigma_year ~ normal(0, 1); 
  sigma_all ~ normal(0, 1); 
  nu ~ gamma(2, 0.1); // Prior for degrees of freedom in t-dist
  gamma ~ normal(0, 1); 
  to_vector(z_beta) ~ normal(0, 1);
  L_Omega_beta ~ lkj_corr_cholesky(2);
  tau_beta ~ normal(0, 1); 
  to_vector(mu_beta) ~ normal(0, 1);
  to_vector(z_lambda) ~ normal(0, 1);
  L_Omega_lambda ~ lkj_corr_cholesky(2);
  tau_lambda ~ normal(0, 1); 
  y ~ student_t(nu, y_hat, sigma);

generated quantities {
 vector[N] y_pred; //  posterior predictive distribution

 for(i in 1:N)
 y_pred[i] = student_t_rng(nu, y_hat[i], sigma);


Thanks in advance for any help!

If anyone is curious- here is the associated Github repo.


Realized why the log probability was evaluating to log(0).

Because some alliances only have members from one group, some \lambda_j parameters have no variation in membership- all the relevant elements of the membership matrix Z are 0.

Group 1 has at least one participant in 192/284 treaties. Group 2 has at least one participant in 146/292. There’s some overlap between the groups, but also many treaties that only have members from one group.

Dear @jkalley,

thanks for posting the interesting multi-level model.

I have a similar model, however, I would like to have a ‘non-centered’ multivariate student-t distribution (or any analogous heavier-tailed distribution) at the highest level of my model (at line: β_j∼MVN(μ_β,Σ_β) for j∈1:J).

Do you by any chance know how to implement that?


Not off the top of my head. The tricky thing is dealing with \nu, of course.

See this earlier thread, which has a good starting point. Unfortunately the Google groups link is broken. If you can’t sort it out from the information there, I’d suggest creating a new topic- your request is pretty different from what I asked about here.

How to deal with \nu is exactly where I got stuck.

Your are right, my question is completely different from yours so I will create a new topic.