Hey!
Thanks for sending the data! There are a few points I wanted to comment on. Please take everything with a pinch of salt, since I don’t know the specifics of your data or the model you try to fit.
- The scales of the data are huge! Since all variables are strictly positive, I ended up taking logs of all of them. This is common practice in my field (Econ), not sure if it makes sense in your application – it’s a different functional form, but it is much easier to work on a log scale with such dispersed data.
- The predictor variables are fairly highly correlated with each other. This can make the sampling process much harder. The QR parameterization can help with that (check out here and here).
- 4 out of 6 explanatory variables are actually group level variables. You typically don’t want those to have group-varying coefficients. The Gelman/Hill textbook has a sub-chapter on varying intercepts and slope models with group-level predictors.
- In Stan multi-level models are usually parameterized via the (marginal) standard deviations and the Cholesky factor of the correlation matrix. This is what @maxbiostat pointed out above. He also rightly pointed out, that tighter priors are often much better for fitting such models.
I’d propose a model like this:
\log(cc) \sim \text{Normal}(Q\tilde\beta_{m}, \sigma),
where \beta is array size M (for counties 1:M) of vectors length 3 (intercept, coefficient of log(c_ly), and coefficient of log(c_lm)), indexed at the appropriate m. Q is a N \times 3 matrix computed like this:
1. compute log(c_ly) and log(c_lm) and put them in a matrix with dimension [N,2]
2. subtract the column mean of each column (centering)
3. use QR decomposition and get Q: qr_thin_Q() function in Stan
4. multiply Q by sqrt(N - 1)
5. append a column of 1s to that matrix
The distribution on \tilde\beta_m is Multivariate Normal, where a non-centered parameterization is used:
\begin{aligned}
\tilde\beta_m &= \mu_{\tilde\beta,m}' + \text{diag}(\sigma_\tilde\beta)L^\Omega_\tilde\beta Z \\
Z &\sim \text{MVN}(0, I),
\end{aligned}
where \text{diag}(\sigma_\tilde\beta) is the diagonal matrix of the three standard deviations of \tilde\beta and L^\Omega_\tilde\beta is the Cholesky factor of their correlation matrix. The Z's are just a bunch of standard normal variables. The group-level predictors enter in \mu_{\tilde\beta,m}:
\mu_{\tilde\beta} = Q^{\text{M}}\tilde\gamma,
where Q^{\text{M}} is an M \times 5 predictor matrix with group-level variables and an intercept, \tilde\gamma is a 5 \times 3 matrix of coefficient, so that \mu_{\tilde\beta} has dimension M \times 3 — that’s why it’s indexed and transposed above. The matrix Q^{\text{M}} is built like this:
1. compute log(m), log(p), log(nw), log(gp) and put them in a matrix with dimension [M,4]
2. subtract the column mean of each column (centering)
3. use QR decomposition and get Q^M: qr_thin_Q() function in Stan
4. multiply Q^M by sqrt(M - 1)
5. append a column of 1s to that matrix
This implies (this is the centered parameterization, which is probably less computationally efficient):
\begin{bmatrix}
\tilde\beta_{1,m} \\
\tilde\beta_{2,m} \\
\tilde\beta_{3,m} \\
\end{bmatrix}
\sim
\text{MVN} \left(Q^{\text{M}}_m\tilde\gamma, \Sigma_\tilde\beta\right),
where \Sigma_\tilde\beta = \text{diag}(\sigma_\tilde\beta)\Omega_\tilde\beta\text{diag}(\sigma_\tilde\beta), so loosely speaking, the group-level effects are “projected” onto the varying slopes and intercepts of the “individual” level.
Okay, so this was very detailed… haha. But I didn’t know how to put it more simply. In my trials it was really the combination of QR decomposition, group-level effects as “projection”/interaction, and non-centered parameterization that worked.
I can post the (mostly) working Stan code tomorrow.
Cheers!
Max
Edit: The tildes on top of \tilde\beta and \tilde\gamma denote the fact that you are probably not interested in them per se, but rather the “real” coefficients \beta and \gamma. You can get those back by doing the steps for creating Q or Q^M, but doing the last 3 point differently:
...
3. use QR decomposition and get R: qr_thin_R() function in Stan
4. devide R by sqrt(M - 1)
5. compute the inverse of R --> R_inv = inverse(R) in Stan
6. compute beta[K-1] = R_inc * beta_tilde[2:K]
(above assumes that the intercept column of 1s is in first position)
This is also described in the QR resources I linked to above. Note, that this will not undo the centering of the variables, i.e. your intercepts still have the “as if variables are centered” interpretation, which is also important if you want to calculate predictions or other quantities of interest.