How to set up the priors for the random effects?

ygao43 · January 30, 2021, 11:17pm

Can anyone help with the Cholesky decomposition and correlation among random effects? I have a joint model and the random intercept and random slope are assumed. When I tried to use Cholesky in my Stan code, the prior for the random matrix could slow up the whole computation up to 1 hours.

For instance, normal_lpdf(to_vector(z_u)|0,1); is much slower compared to uniform_lpdf(to_vector(z_u)|-2,2); However, the latter run fast but the effective sample size is very low. Is there a better way to speed up it without the loss of accuracy?

parameters {
  vector[2] mu;                   
  real<lower=0> sigma;           
  //cov_matrix[2] prec_Mat;
  
  real lambda;
  real beta;
  real alpha;
  
  // random effects standard deviations
  vector<lower=0>[2] sigma_u;       
  // declare L_u to be the Choleski factor of a 2x2 correlation matrix
  cholesky_factor_corr[2] L_u;
  // random effect matrix
  matrix[2,n_sub] z_u;                  
  
}

transformed parameters {
  matrix[2,n_sub] c_bs;
  c_bs = diag_pre_multiply(sigma_u, L_u) * z_u;
}

model {
  // LKJ prior for the correlation matrix
      target += uniform_lpdf(sigma_u| 0.01, 10);
      target += lkj_corr_cholesky_lpdf(L_u | 1.0); 
      target += normal_lpdf(to_vector(z_u)|0,1);
}

Max_Mantei · February 2, 2021, 2:33pm

Hey there! Welcome to the Stan community! :)

I edited your post a bit to have proper code highlighting (using `` and ```stan```).

Having a Uniform prior on z_u is mathematically/statistically wrong. You have to put a standard normal prior on z_u to induce a multivariate normal prior on the correlated random effects. I’d advice to put more informative priors on sigma_u. Maybe something like a (positive) Half-Normal(0, 2.5) or something like that. Maybe this will also help with your speed issues.

Cheers,
Max

ygao43 · February 4, 2021, 3:20am

Hi Max,

Thank you for editing and reply to my post. Could you explain more about " Having a Uniform prior on z_u is mathematically/statistically wrong." in more details? I think z_u has to be non-negative. My uniform prior could also restrict it to the same range. I don’t understand why it is wrong here. Sorry that I am new to stan.

Max_Mantei · February 4, 2021, 10:14am

Hey! :)

Don’t be! People picking up Stan is great news to us! :)

Sure! Let’s first look at the univariate case with observations i and groups j and a single varying intercept, i.e. y_i \sim \text{Normal}(\alpha_{j[i]} + X_i\beta,\sigma_y) with \alpha_j \sim \text{Normal}(\mu_\text{group}, \sigma_\text{group}), where j[i] is the group that i belongs to. This is the centered parameterization (CP). Often in Stan it is better to code the model with a non-centered parameterization (NCP). To do this, we reformulate \alpha_j = \mu_\text{group} + \sigma_\text{group}\cdot z_j, where z_j \sim \text{Normal}(0,1), so that we sample from a standard normal variable z and then scale it (by \sigma_\text{group}) and shift it (by \mu_\text{group}). We often set \mu_\text{group} to zero and let it be absorbed by an overall model intercept, i.e. y_i \sim \text{Normal}(\alpha + \alpha_{j[i]} + X_i\beta,\sigma_y), with \alpha_j = \sigma_\text{group}\cdot z_j and z_j \sim \text{Normal}(0,1). Note that we can only do this because of the properties (scaling and shifting) of the normal distribution!

Mathematically CP and NCP are equivalent, but the latter usually has a nicer geometry for Stan to work with.

Now in the multivariate case we might have a model with varying intercept and varying slope y_i \sim \text{Normal}(\alpha_{j[i]} + X_i\beta_{j[i]},\sigma_y), where

\begin{bmatrix} \alpha_j \\ \beta_j \end{bmatrix} \sim \text{MultiNormal}\left( \begin{bmatrix} \mu_{\alpha} \\ \mu_\beta \end{bmatrix}, \Sigma \right)

with covariance matrix \Sigma. This is the CP formulation. To get to the NCP formulation of the model we again shift and scale, but we also now have to take care of the correlation. To do that we rotate the new parameter by the Cholesky factor of the correlation matrix, which acts like a square root of the correlation matrix. We thus have,

\begin{bmatrix} \alpha_j \\ \beta_j \end{bmatrix} = \begin{bmatrix} \mu_{\alpha} \\ \mu_\beta \end{bmatrix} + \begin{bmatrix} \sigma_{\alpha} & 0 \\ 0 & \sigma_\beta \end{bmatrix} \Omega^L Z_j

where \Omega^L is the lower triangular Cholesky decomposition of the correlation matrix \Omega and for every j we have Z_j \sim \text{MultiNormal}(\mathbf{0}, \mathbf{I}_{2\times2}) is a size two vector of standard normal variables. When you compress everything a little more you can write Z as 2 \times J matrix and then say that for k=1,2 we have Z_{k,j} \sim \text{Normal}(0,1). Note that this is again possible due to the properties of the (multivariate) normal distribution: We can shift it, scale it and rotate it. So that’s where this

is coming from.

I hope this was not too convoluted. It’s a long way of saying that you did everything correctly.

Did you try taking away the upper bound on sigma_u and giving it a Half-Normal or Half-t prior? If that doesn’t help, then maybe some other part of your model might lead to slow sampling. That being said, the hard reality is that sometimes MCMC is just slow (and by using Stan you probably going as fast as it can get).

Cheers,
Max

ygao43 · February 5, 2021, 3:12am

Hi Max,

Thank you for your detailed explanations. That makes sense to have a half-normal or t prior for the z_u to induce the normal distribution of the posteriors. However, I have no idea why when I tried them, my posterior for the covariance matrix actually converged to the very small values compared to my “true” values in my simulations. Therefore, I switched to the non-informative prior and it worked well for my joint model after revising the other priors. The inverse-Wishart prior worked well but it took too long.

Did you encounter this kind of converge issues? That is, the converged values were far away from your “true” values with the half-normal or t priors in terms of the covariance matrix?

Yan

Max_Mantei · February 5, 2021, 2:37pm

Hey Yan!

Note that Half-Normal/Half-Student-t priors are for the sigma_u, the marginal standard deviation of each random effect. The z_u have to have standard normal (unconstrained) priors.

For example:

...
transformed parameters {
  matrix[2,n_sub] c_bs;
  c_bs = diag_pre_multiply(sigma_u, L_u) * z_u;
}

model {
  // LKJ prior for the correlation matrix
      target += normal_lpdf(sigma_u| 0, 5); // Half-Normal is implied by the lower bound
      target += lkj_corr_cholesky_lpdf(L_u | 1.0); // I'd also try a tighter prior here, maybe like eta = 4
      target += normal_lpdf(to_vector(z_u)|0,1);
}

Did you try this?

Honestly, usually only when I have a bug in my code. Sometimes you’d have to switch to CP, but even then the posterior quantities are still close to the “truth” usually.

Cheers,
Max

ygao43 · February 5, 2021, 6:28pm

Hi Max,

Thank you for your comments and feedback. I tried the exact prior in my previous coding. However, it converged to the very small numbers. I guess it also depends on your data and other parameters. Thank you again for your feedback and time.

Yan

Max_Mantei · February 10, 2021, 9:57am

Hey Yan! Sorry to hear that you are still experiencing problems.

Can you post your full model and maybe some (toy) data to run it with? Then I can check.

Cheers,
Max

ygao43 · February 11, 2021, 8:08pm

Hi Max,

Thank you for following up with it. I am revising my codes now and will let you know if the problem still exits. Thank you again for your time.

Yan

Topic		Replies	Views
Covariance matrix Modeling stan	4	634	March 26, 2021
Sparse sampling for a Cholesky correlation matrix (or modelling correlations among some, but not all parameters) Modeling	15	1248	February 1, 2019
Question about Cholesky Decomposition Modeling specification , hierarchical-model	2	688	October 21, 2020
Parameterization of covariance matrices Modeling specification	2	642	September 24, 2020
Initialization Issues with Cholesky Factorization of Covariance Matrix Modeling performance , covariance , prior-choice	1	1011	April 25, 2020

How to set up the priors for the random effects?

Related topics