How to set up the priors for the random effects?

Can anyone help with the Cholesky decomposition and correlation among random effects? I have a joint model and the random intercept and random slope are assumed. When I tried to use Cholesky in my Stan code, the prior for the random matrix could slow up the whole computation up to 1 hours.

For instance, normal_lpdf(to_vector(z_u)|0,1); is much slower compared to uniform_lpdf(to_vector(z_u)|-2,2); However, the latter run fast but the effective sample size is very low. Is there a better way to speed up it without the loss of accuracy?

parameters {
  vector[2] mu;                   
  real<lower=0> sigma;           
  //cov_matrix[2] prec_Mat;
  
  real lambda;
  real beta;
  real alpha;
  
  // random effects standard deviations
  vector<lower=0>[2] sigma_u;       
  // declare L_u to be the Choleski factor of a 2x2 correlation matrix
  cholesky_factor_corr[2] L_u;
  // random effect matrix
  matrix[2,n_sub] z_u;                  
  
}

transformed parameters {
  matrix[2,n_sub] c_bs;
  c_bs = diag_pre_multiply(sigma_u, L_u) * z_u;
}

model {
  // LKJ prior for the correlation matrix
      target += uniform_lpdf(sigma_u| 0.01, 10);
      target += lkj_corr_cholesky_lpdf(L_u | 1.0); 
      target += normal_lpdf(to_vector(z_u)|0,1);
}

Hey there! Welcome to the Stan community! :)

I edited your post a bit to have proper code highlighting (using `` and ```stan```).

Having a Uniform prior on z_u is mathematically/statistically wrong. You have to put a standard normal prior on z_u to induce a multivariate normal prior on the correlated random effects. I’d advice to put more informative priors on sigma_u. Maybe something like a (positive) Half-Normal(0, 2.5) or something like that. Maybe this will also help with your speed issues.

Cheers,
Max

2 Likes

Hi Max,

Thank you for editing and reply to my post. Could you explain more about " Having a Uniform prior on z_u is mathematically/statistically wrong." in more details? I think z_u has to be non-negative. My uniform prior could also restrict it to the same range. I don’t understand why it is wrong here. Sorry that I am new to stan.

Hey! :)

Don’t be! People picking up Stan is great news to us! :)

Sure! Let’s first look at the univariate case with observations i and groups j and a single varying intercept, i.e. y_i \sim \text{Normal}(\alpha_{j[i]} + X_i\beta,\sigma_y) with \alpha_j \sim \text{Normal}(\mu_\text{group}, \sigma_\text{group}), where j[i] is the group that i belongs to. This is the centered parameterization (CP). Often in Stan it is better to code the model with a non-centered parameterization (NCP). To do this, we reformulate \alpha_j = \mu_\text{group} + \sigma_\text{group}\cdot z_j, where z_j \sim \text{Normal}(0,1), so that we sample from a standard normal variable z and then scale it (by \sigma_\text{group}) and shift it (by \mu_\text{group}). We often set \mu_\text{group} to zero and let it be absorbed by an overall model intercept, i.e. y_i \sim \text{Normal}(\alpha + \alpha_{j[i]} + X_i\beta,\sigma_y), with \alpha_j = \sigma_\text{group}\cdot z_j and z_j \sim \text{Normal}(0,1). Note that we can only do this because of the properties (scaling and shifting) of the normal distribution!

Mathematically CP and NCP are equivalent, but the latter usually has a nicer geometry for Stan to work with.

Now in the multivariate case we might have a model with varying intercept and varying slope y_i \sim \text{Normal}(\alpha_{j[i]} + X_i\beta_{j[i]},\sigma_y), where

\begin{bmatrix} \alpha_j \\ \beta_j \end{bmatrix} \sim \text{MultiNormal}\left( \begin{bmatrix} \mu_{\alpha} \\ \mu_\beta \end{bmatrix}, \Sigma \right)

with covariance matrix \Sigma. This is the CP formulation. To get to the NCP formulation of the model we again shift and scale, but we also now have to take care of the correlation. To do that we rotate the new parameter by the Cholesky factor of the correlation matrix, which acts like a square root of the correlation matrix. We thus have,

\begin{bmatrix} \alpha_j \\ \beta_j \end{bmatrix} = \begin{bmatrix} \mu_{\alpha} \\ \mu_\beta \end{bmatrix} + \begin{bmatrix} \sigma_{\alpha} & 0 \\ 0 & \sigma_\beta \end{bmatrix} \Omega^L Z_j

where \Omega^L is the lower triangular Cholesky decomposition of the correlation matrix \Omega and for every j we have Z_j \sim \text{MultiNormal}(\mathbf{0}, \mathbf{I}_{2\times2}) is a size two vector of standard normal variables. When you compress everything a little more you can write Z as 2 \times J matrix and then say that for k=1,2 we have Z_{k,j} \sim \text{Normal}(0,1). Note that this is again possible due to the properties of the (multivariate) normal distribution: We can shift it, scale it and rotate it. So that’s where this

is coming from.

I hope this was not too convoluted. It’s a long way of saying that you did everything correctly.

Did you try taking away the upper bound on sigma_u and giving it a Half-Normal or Half-t prior? If that doesn’t help, then maybe some other part of your model might lead to slow sampling. That being said, the hard reality is that sometimes MCMC is just slow (and by using Stan you probably going as fast as it can get).

Cheers,
Max

5 Likes

Hi Max,

Thank you for your detailed explanations. That makes sense to have a half-normal or t prior for the z_u to induce the normal distribution of the posteriors. However, I have no idea why when I tried them, my posterior for the covariance matrix actually converged to the very small values compared to my “true” values in my simulations. Therefore, I switched to the non-informative prior and it worked well for my joint model after revising the other priors. The inverse-Wishart prior worked well but it took too long.

Did you encounter this kind of converge issues? That is, the converged values were far away from your “true” values with the half-normal or t priors in terms of the covariance matrix?

Yan

Hey Yan!

Note that Half-Normal/Half-Student-t priors are for the sigma_u, the marginal standard deviation of each random effect. The z_u have to have standard normal (unconstrained) priors.

For example:

...
transformed parameters {
  matrix[2,n_sub] c_bs;
  c_bs = diag_pre_multiply(sigma_u, L_u) * z_u;
}

model {
  // LKJ prior for the correlation matrix
      target += normal_lpdf(sigma_u| 0, 5); // Half-Normal is implied by the lower bound
      target += lkj_corr_cholesky_lpdf(L_u | 1.0); // I'd also try a tighter prior here, maybe like eta = 4
      target += normal_lpdf(to_vector(z_u)|0,1);
}

Did you try this?

Honestly, usually only when I have a bug in my code. Sometimes you’d have to switch to CP, but even then the posterior quantities are still close to the “truth” usually.

Cheers,
Max

Hi Max,

Thank you for your comments and feedback. I tried the exact prior in my previous coding. However, it converged to the very small numbers. I guess it also depends on your data and other parameters. Thank you again for your feedback and time.

Yan

Hey Yan! Sorry to hear that you are still experiencing problems.

Can you post your full model and maybe some (toy) data to run it with? Then I can check.

Cheers,
Max

3 Likes

Hi Max,

Thank you for following up with it. I am revising my codes now and will let you know if the problem still exits. Thank you again for your time.

Yan