Specifying varying slopes across different levels of hierarchical model


Hi all,

I’m having trouble coding varying slopes across different levels in a hierarchical model. I would like to vary slopes at a multiple levels of analysis based on how observations at one level are grouped.

I would appreciate input on 1: how to write the model down correctly, and 2: how to code it in STAN.

As an example of what I’m trying to do: consider the radon example and imagine individuals are part of another group besides counties. In that case, I would let the coefficients on county-level variables vary according to individual-level membership in that other group. Being in the group changes the impact of factors at a different level of analysis.

Specifically, I’m interested in how participation in international alliances alters the military spending of member states. I expect particular alliance characteristics will impact military spending, and each alliance treaty will have a different effect on military spending.

States are members of multiple alliances, and membership is not nested. There are 280 alliances in the data. In the model, I have both state and alliance-level variables, along with state and year varying intercepts.

The mean of the outcome is:
\mu = \alpha + \alpha^{st} + \alpha^{yr} +\textbf{W}_{n \times k} \gamma_{k \times 1} + \textbf{Z}_{n \times a} \lambda_{a \times 1}

Then the mean of each alliance-specific parameter \lambda is a function of alliance level variables, where
\lambda \sim N(X \beta, \sigma_all)

Z is a matrix of state membership in alliances- this is how I capture membership in multiple treaties. Columns are alliances, and rows are state-year observations. If a state is not in an alliance, the matrix element is zero. If a state is in an alliance, the corresponding element is the rescaled value of allied military spending.

States can be divided into J groups. I’ve check this by estimating the above model on J samples, but would prefer to let at least the \lambda and \beta vary by group. Getting \gamma to vary by a state-year-level group is straightforward- I did that for practice.

I think the model can be re-written like this:

\mu = \alpha + \alpha^{st} + \alpha^{yr} +\textbf{W} \gamma_{j} + \textbf{Z} \lambda_{j}

\lambda_j \sim MVN(X \beta_j, \Sigma_{all}) \mbox{for } j \in 1:J

\beta_j \sim MVN(\mu_{\beta}, \Sigma_{\beta}) \mbox{for } j \in 1:J

But the indexing across levels is a little weird- j is a group at the state-year level. The unit of analysis for the \lambda regression is alliances. Moreover, alliance membership is uneven across groups- some treaties involve only members of one group.

The code I have so far is below- it uses Cholesky factorization for the varying slopes of \lambda and \beta. Right now, this code generates an error: Rejecting initial value: Log probability evaluates to log(0), i.e. negative infinity.

I think the error is because I have not specified the model correctly. I’ve built things piece by piece to this point- fitting the model without varying slopes, or with varying \gamma parameters has no problems.

data {
  int<lower = 1> N; // Number of observations
  int<lower = 1> S; // number of states
  int<lower = 1> T; // number of years
  int<lower = 1> J; // number of state capability groups
  int<lower = 1> A; // number of alliances
  int<lower = 1> L; // number of alliance-level variables
  int<lower = 1> M; // number of state-level variables
  int<lower = 1, upper = S> state[N]; // state indicator
  int<lower = 1, upper = T> year[N]; // year indicator
  int<lower = 1, upper = J> cap[N]; // major-power indicator
  matrix[A, L] X; // matrix of alliance-level variables
  matrix[N, M] W; // matrix of state-level variables
  matrix[N, A] Z; // matrix of state membership in alliances
  vector[N] y; // outcome 

parameters {
  real alpha; // overall intercept
  real<lower = 0> sigma; // variance of outcome
  vector[S] alpha_state_std; // better behaved distribution of state intercepts
  vector[T] alpha_year_std; // better behaved distribution of year intercepts
  real<lower = 0> sigma_state; // variance hyperparameter of the state intercepts
  real<lower = 0> sigma_year; // variance hyperparameter of the year intercepts
  real<lower = 0> sigma_all; // variance hyperparameter of the alliances
  vector[M] gamma; // vector of state-level variables (slopes do not vary by group)
  matrix[J, L] mu_beta; // mean of alliance-level coefficients
  vector<lower = 0>[L] tau_beta; // mean of theta par in multivariate distribution 
  matrix[L, J] z_beta; // for non-centered Cholesky factorization 
  cholesky_factor_corr[L] L_Omega_beta; // for non-centered Cholesky factorization 
  vector<lower = 0>[A] tau_lambda; // mean of theta par in multivariate distribution 
  matrix[A, J] z_lambda; // for non-centered Cholesky factorization 
  cholesky_factor_corr[A] L_Omega_lambda; // for non-centered Cholesky factorization 
  real<lower = 3> nu; // degrees of freedom in t-distribution of outcome


transformed parameters {
  vector[S] alpha_state; // state intercepts
  vector[T] alpha_year; // year intercepts
  matrix[J, L] beta; // mean of alliance parameters- transposed in lambdas
  matrix[J, A] lambda; // alliance parameters
  vector[N] y_hat; // linear prediction of the outcome mean

 alpha_state = 0 + sigma_state * alpha_state_std; // non-centered parameterization, where alpha_state ~ N(0, sigma_state)

alpha_year = 0 + sigma_year * alpha_year_std; // non-centered parameterization, where alpha_state ~ N(0, sigma_state)

// varying slopes in alliance-level regression parameters beta 
  beta = mu_beta + (diag_pre_multiply(tau_beta, L_Omega_beta) * z_beta)';

// Varying slopes in alliance parameters lambda 

// Multivariate distribution of lambdas by capability expressed w/ Cholesky factorization 
  lambda = (X * beta')' + (diag_pre_multiply(tau_lambda, L_Omega_lambda) * z_lambda)';

// Linear prediction of the state-year spending. 
    y_hat = alpha + alpha_state[state] + alpha_year[year] + W * gamma + rows_dot_product(lambda[cap], Z) ;

model {
  alpha ~ normal(0, 1);
  sigma ~ normal(0, 1);
  alpha_year_std ~ normal(0, 1);
  alpha_state_std ~ normal(0, 1); 
  sigma_state ~ normal(0, 1);
  sigma_year ~ normal(0, 1); 
  sigma_all ~ normal(0, 1); 
  nu ~ gamma(2, 0.1); // Prior for degrees of freedom in t-dist
  gamma ~ normal(0, 1); 
  to_vector(z_beta) ~ normal(0, 1);
  L_Omega_beta ~ lkj_corr_cholesky(2);
  tau_beta ~ normal(0, 1); 
  to_vector(mu_beta) ~ normal(0, 1);
  to_vector(z_lambda) ~ normal(0, 1);
  L_Omega_lambda ~ lkj_corr_cholesky(2);
  tau_lambda ~ normal(0, 1); 
  y ~ student_t(nu, y_hat, sigma);

generated quantities {
 vector[N] y_pred; //  posterior predictive distribution

 for(i in 1:N)
 y_pred[i] = student_t_rng(nu, y_hat[i], sigma);


Thanks in advance for any help!

If anyone is curious- here is the associated Github repo.

1 Like


Realized why the log probability was evaluating to log(0).

Because some alliances only have members from one group, some \lambda_j parameters have no variation in membership- all the relevant elements of the membership matrix Z are 0.

Group 1 has at least one participant in 192/284 treaties. Group 2 has at least one participant in 146/292. There’s some overlap between the groups, but also many treaties that only have members from one group.