Hey!

Thanks for sending the data! There are a few points I wanted to comment on. Please take everything with a pinch of salt, since I donâ€™t know the specifics of your data or the model you try to fit.

- The scales of the data are huge! Since all variables are strictly positive, I ended up taking logs of all of them. This is common practice in my field (Econ), not sure if it makes sense in your application â€“ itâ€™s a different functional form, but it is much easier to work on a log scale with such dispersed data.
- The predictor variables are fairly highly correlated with each other. This can make the sampling process much harder. The QR parameterization can help with that (check out here and here).
- 4 out of 6 explanatory variables are actually group level variables. You typically donâ€™t want those to have group-varying coefficients. The Gelman/Hill textbook has a sub-chapter on varying intercepts and slope models with group-level predictors.
- In Stan multi-level models are usually parameterized via the (marginal) standard deviations and the Cholesky factor of the correlation matrix. This is what @maxbiostat pointed out above. He also rightly pointed out, that tighter priors are often much better for fitting such models.

Iâ€™d propose a model like this:

\log(cc) \sim \text{Normal}(Q\tilde\beta_{m}, \sigma),

where \beta is array size M (for counties 1:M) of vectors length 3 (intercept, coefficient of log(c_ly), and coefficient of log(c_lm)), indexed at the appropriate m. Q is a N \times 3 matrix computed like this:

```
1. compute log(c_ly) and log(c_lm) and put them in a matrix with dimension [N,2]
2. subtract the column mean of each column (centering)
3. use QR decomposition and get Q: qr_thin_Q() function in Stan
4. multiply Q by sqrt(N - 1)
5. append a column of 1s to that matrix
```

The distribution on \tilde\beta_m is Multivariate Normal, where a non-centered parameterization is used:

\begin{aligned}
\tilde\beta_m &= \mu_{\tilde\beta,m}' + \text{diag}(\sigma_\tilde\beta)L^\Omega_\tilde\beta Z \\
Z &\sim \text{MVN}(0, I),
\end{aligned}

where \text{diag}(\sigma_\tilde\beta) is the diagonal matrix of the three standard deviations of \tilde\beta and L^\Omega_\tilde\beta is the Cholesky factor of their correlation matrix. The Z's are just a bunch of standard normal variables. The group-level predictors enter in \mu_{\tilde\beta,m}:

\mu_{\tilde\beta} = Q^{\text{M}}\tilde\gamma,

where Q^{\text{M}} is an M \times 5 predictor matrix with group-level variables and an intercept, \tilde\gamma is a 5 \times 3 matrix of coefficient, so that \mu_{\tilde\beta} has dimension M \times 3 â€” thatâ€™s why itâ€™s indexed and transposed above. The matrix Q^{\text{M}} is built like this:

```
1. compute log(m), log(p), log(nw), log(gp) and put them in a matrix with dimension [M,4]
2. subtract the column mean of each column (centering)
3. use QR decomposition and get Q^M: qr_thin_Q() function in Stan
4. multiply Q^M by sqrt(M - 1)
5. append a column of 1s to that matrix
```

This implies (this is the centered parameterization, which is probably less computationally efficient):

\begin{bmatrix}
\tilde\beta_{1,m} \\
\tilde\beta_{2,m} \\
\tilde\beta_{3,m} \\
\end{bmatrix}
\sim
\text{MVN} \left(Q^{\text{M}}_m\tilde\gamma, \Sigma_\tilde\beta\right),

where \Sigma_\tilde\beta = \text{diag}(\sigma_\tilde\beta)\Omega_\tilde\beta\text{diag}(\sigma_\tilde\beta), so loosely speaking, the group-level effects are â€śprojectedâ€ť onto the varying slopes and intercepts of the â€śindividualâ€ť level.

Okay, so this was very detailedâ€¦ haha. But I didnâ€™t know how to put it more simply. In my trials it was really the combination of QR decomposition, group-level effects as â€śprojectionâ€ť/interaction, and non-centered parameterization that worked.

I can post the (mostly) working Stan code tomorrow.

Cheers!

Max

**Edit:** The tildes on top of \tilde\beta and \tilde\gamma denote the fact that you are probably not interested in them per se, but rather the â€śrealâ€ť coefficients \beta and \gamma. You can get those back by doing the steps for creating Q or Q^M, but doing the last 3 point differently:

```
...
3. use QR decomposition and get R: qr_thin_R() function in Stan
4. devide R by sqrt(M - 1)
5. compute the inverse of R --> R_inv = inverse(R) in Stan
6. compute beta[K-1] = R_inc * beta_tilde[2:K]
(above assumes that the intercept column of 1s is in first position)
```

This is also described in the QR resources I linked to above. Note, that this will not undo the centering of the variables, i.e. your intercepts still have the â€śas if variables are centeredâ€ť interpretation, which is also important if you want to calculate predictions or other quantities of interest.