Hi all,
I’m having trouble coding varying slopes across different levels in a hierarchical model. I would like to vary slopes at a multiple levels of analysis based on how observations at one level are grouped.
I would appreciate input on 1: how to write the model down correctly, and 2: how to code it in STAN.
As an example of what I’m trying to do: consider the radon example and imagine individuals are part of another group besides counties. In that case, I would let the coefficients on county-level variables vary according to individual-level membership in that other group. Being in the group changes the impact of factors at a different level of analysis.
Specifically, I’m interested in how participation in international alliances alters the military spending of member states. I expect particular alliance characteristics will impact military spending, and each alliance treaty will have a different effect on military spending.
States are members of multiple alliances, and membership is not nested. There are 280 alliances in the data. In the model, I have both state and alliance-level variables, along with state and year varying intercepts.
The mean of the outcome is:
\mu = \alpha + \alpha^{st} + \alpha^{yr} +\textbf{W}_{n \times k} \gamma_{k \times 1} + \textbf{Z}_{n \times a} \lambda_{a \times 1}
Then the mean of each alliance-specific parameter \lambda is a function of alliance level variables, where
\lambda \sim N(X \beta, \sigma_all)
Z is a matrix of state membership in alliances- this is how I capture membership in multiple treaties. Columns are alliances, and rows are state-year observations. If a state is not in an alliance, the matrix element is zero. If a state is in an alliance, the corresponding element is the rescaled value of allied military spending.
States can be divided into J groups. I’ve check this by estimating the above model on J samples, but would prefer to let at least the \lambda and \beta vary by group. Getting \gamma to vary by a state-year-level group is straightforward- I did that for practice.
I think the model can be re-written like this:
\mu = \alpha + \alpha^{st} + \alpha^{yr} +\textbf{W} \gamma_{j} + \textbf{Z} \lambda_{j}
\lambda_j \sim MVN(X \beta_j, \Sigma_{all}) \mbox{for } j \in 1:J
\beta_j \sim MVN(\mu_{\beta}, \Sigma_{\beta}) \mbox{for } j \in 1:J
But the indexing across levels is a little weird- j is a group at the state-year level. The unit of analysis for the \lambda regression is alliances. Moreover, alliance membership is uneven across groups- some treaties involve only members of one group.
The code I have so far is below- it uses Cholesky factorization for the varying slopes of \lambda and \beta. Right now, this code generates an error: Rejecting initial value: Log probability evaluates to log(0), i.e. negative infinity.
I think the error is because I have not specified the model correctly. I’ve built things piece by piece to this point- fitting the model without varying slopes, or with varying \gamma parameters has no problems.
data {
int<lower = 1> N; // Number of observations
int<lower = 1> S; // number of states
int<lower = 1> T; // number of years
int<lower = 1> J; // number of state capability groups
int<lower = 1> A; // number of alliances
int<lower = 1> L; // number of alliance-level variables
int<lower = 1> M; // number of state-level variables
int<lower = 1, upper = S> state[N]; // state indicator
int<lower = 1, upper = T> year[N]; // year indicator
int<lower = 1, upper = J> cap[N]; // major-power indicator
matrix[A, L] X; // matrix of alliance-level variables
matrix[N, M] W; // matrix of state-level variables
matrix[N, A] Z; // matrix of state membership in alliances
vector[N] y; // outcome
}
parameters {
real alpha; // overall intercept
real<lower = 0> sigma; // variance of outcome
vector[S] alpha_state_std; // better behaved distribution of state intercepts
vector[T] alpha_year_std; // better behaved distribution of year intercepts
real<lower = 0> sigma_state; // variance hyperparameter of the state intercepts
real<lower = 0> sigma_year; // variance hyperparameter of the year intercepts
real<lower = 0> sigma_all; // variance hyperparameter of the alliances
vector[M] gamma; // vector of state-level variables (slopes do not vary by group)
matrix[J, L] mu_beta; // mean of alliance-level coefficients
vector<lower = 0>[L] tau_beta; // mean of theta par in multivariate distribution
matrix[L, J] z_beta; // for non-centered Cholesky factorization
cholesky_factor_corr[L] L_Omega_beta; // for non-centered Cholesky factorization
vector<lower = 0>[A] tau_lambda; // mean of theta par in multivariate distribution
matrix[A, J] z_lambda; // for non-centered Cholesky factorization
cholesky_factor_corr[A] L_Omega_lambda; // for non-centered Cholesky factorization
real<lower = 3> nu; // degrees of freedom in t-distribution of outcome
}
transformed parameters {
vector[S] alpha_state; // state intercepts
vector[T] alpha_year; // year intercepts
matrix[J, L] beta; // mean of alliance parameters- transposed in lambdas
matrix[J, A] lambda; // alliance parameters
vector[N] y_hat; // linear prediction of the outcome mean
alpha_state = 0 + sigma_state * alpha_state_std; // non-centered parameterization, where alpha_state ~ N(0, sigma_state)
alpha_year = 0 + sigma_year * alpha_year_std; // non-centered parameterization, where alpha_state ~ N(0, sigma_state)
// varying slopes in alliance-level regression parameters beta
beta = mu_beta + (diag_pre_multiply(tau_beta, L_Omega_beta) * z_beta)';
// Varying slopes in alliance parameters lambda
// Multivariate distribution of lambdas by capability expressed w/ Cholesky factorization
lambda = (X * beta')' + (diag_pre_multiply(tau_lambda, L_Omega_lambda) * z_lambda)';
// Linear prediction of the state-year spending.
y_hat = alpha + alpha_state[state] + alpha_year[year] + W * gamma + rows_dot_product(lambda[cap], Z) ;
}
model {
alpha ~ normal(0, 1);
sigma ~ normal(0, 1);
alpha_year_std ~ normal(0, 1);
alpha_state_std ~ normal(0, 1);
sigma_state ~ normal(0, 1);
sigma_year ~ normal(0, 1);
sigma_all ~ normal(0, 1);
nu ~ gamma(2, 0.1); // Prior for degrees of freedom in t-dist
gamma ~ normal(0, 1);
to_vector(z_beta) ~ normal(0, 1);
L_Omega_beta ~ lkj_corr_cholesky(2);
tau_beta ~ normal(0, 1);
to_vector(mu_beta) ~ normal(0, 1);
to_vector(z_lambda) ~ normal(0, 1);
L_Omega_lambda ~ lkj_corr_cholesky(2);
tau_lambda ~ normal(0, 1);
y ~ student_t(nu, y_hat, sigma);
}
generated quantities {
vector[N] y_pred; // posterior predictive distribution
for(i in 1:N)
y_pred[i] = student_t_rng(nu, y_hat[i], sigma);
}
Thanks in advance for any help!
If anyone is curious- here is the associated Github repo.