Hey! :)
Don’t be! People picking up Stan is great news to us! :)
Sure! Let’s first look at the univariate case with observations i and groups j and a single varying intercept, i.e. y_i \sim \text{Normal}(\alpha_{j[i]} + X_i\beta,\sigma_y) with \alpha_j \sim \text{Normal}(\mu_\text{group}, \sigma_\text{group}), where j[i] is the group that i belongs to. This is the centered parameterization (CP). Often in Stan it is better to code the model with a non-centered parameterization (NCP). To do this, we reformulate \alpha_j = \mu_\text{group} + \sigma_\text{group}\cdot z_j, where z_j \sim \text{Normal}(0,1), so that we sample from a standard normal variable z and then scale it (by \sigma_\text{group}) and shift it (by \mu_\text{group}). We often set \mu_\text{group} to zero and let it be absorbed by an overall model intercept, i.e. y_i \sim \text{Normal}(\alpha + \alpha_{j[i]} + X_i\beta,\sigma_y), with \alpha_j = \sigma_\text{group}\cdot z_j and z_j \sim \text{Normal}(0,1). Note that we can only do this because of the properties (scaling and shifting) of the normal distribution!
Mathematically CP and NCP are equivalent, but the latter usually has a nicer geometry for Stan to work with.
Now in the multivariate case we might have a model with varying intercept and varying slope y_i \sim \text{Normal}(\alpha_{j[i]} + X_i\beta_{j[i]},\sigma_y), where
\begin{bmatrix}
\alpha_j \\ \beta_j
\end{bmatrix} \sim \text{MultiNormal}\left(
\begin{bmatrix}
\mu_{\alpha} \\ \mu_\beta
\end{bmatrix}, \Sigma
\right)
with covariance matrix \Sigma. This is the CP formulation. To get to the NCP formulation of the model we again shift and scale, but we also now have to take care of the correlation. To do that we rotate the new parameter by the Cholesky factor of the correlation matrix, which acts like a square root of the correlation matrix. We thus have,
\begin{bmatrix}
\alpha_j \\ \beta_j
\end{bmatrix} =
\begin{bmatrix}
\mu_{\alpha} \\ \mu_\beta
\end{bmatrix} +
\begin{bmatrix}
\sigma_{\alpha} & 0 \\ 0 & \sigma_\beta
\end{bmatrix} \Omega^L Z_j
where \Omega^L is the lower triangular Cholesky decomposition of the correlation matrix \Omega and for every j we have Z_j \sim \text{MultiNormal}(\mathbf{0}, \mathbf{I}_{2\times2}) is a size two vector of standard normal variables. When you compress everything a little more you can write Z as 2 \times J matrix and then say that for k=1,2 we have Z_{k,j} \sim \text{Normal}(0,1). Note that this is again possible due to the properties of the (multivariate) normal distribution: We can shift it, scale it and rotate it. So that’s where this
is coming from.
I hope this was not too convoluted. It’s a long way of saying that you did everything correctly.
Did you try taking away the upper bound on sigma_u
and giving it a Half-Normal or Half-t prior? If that doesn’t help, then maybe some other part of your model might lead to slow sampling. That being said, the hard reality is that sometimes MCMC is just slow (and by using Stan you probably going as fast as it can get).
Cheers,
Max