Multi-block correlation matrix

I want to estimate a multi-block correlation matrix, where the within-blocks are independent, but the across-blocks are correlated with different correlation structure. Is there any way in Stan to model such a correlation matrix?

A simple example of such correlation matrix with 2 blocks, and each block has 2 dimensions and the correlation strength across blocks is 0.75:

R_true=diag(rep(1, 4));R_true[1,4]=.75; R_true[4,1]=R_true[1,4]

I think you would have to do a Kronecker product with an identity matrix manually because Stan does not expose a Kronecker product function.

Thank you! Do you have any suggestion on how to estimate such a multi-block correlation matrix in Stan?

I would think of each of the blocks as separate parameters.

For my multi-block correlation matrix, the within-block correlation is identity matrix, where the without-block correlation can be any arbitrary matrix. Wouldn’t modeling each without-block correlation separately be too costly? Also, how can I guarantee the final combined correlation matrix still be positive definite?

I’m not sure I could visualize your matrix structure correctly from your example, but I did something that may have been similar, with a MN \times MN matrix with N \times N blocks where the within-block correlation was given by a kernel with M block-specific parameters (or rather a combination of all possible pairs of parameters for its corresponding block column/row) and between-block variance parameters were unconstrained – so there were \binom M 2 there plus the diagonal elements.

For M=2, N=2 it would look something like this:

K = \begin{bmatrix} \sigma^2_{11} \begin{bmatrix} k_{11}(x_{11},x_{11}) & k_{11}(x_{11},x_{12}) \\ k_{11}(x_{12},x_{11}) & k_{11}(x_{12},x_{12}) \end{bmatrix} \sigma^2_{12} \begin{bmatrix} k_{12}(x_{11},x_{21}) & k_{12}(x_{11},x_{22}) \\ k_{12}(x_{12},x_{21}) & k_{12}(x_{12},x_{22}) \end{bmatrix} \\ \sigma^2_{21} \begin{bmatrix} k_{21}(x_{21},x_{11}) & k_{21}(x_{21},x_{12}) \\ k_{21}(x_{22},x_{11}) & k_{21}(x_{22},x_{12}) \end{bmatrix} \sigma^2_{22} \begin{bmatrix} k_{22}(x_{21},x_{21}) & k_{22}(x_{21},x_{22}) \\ k_{22}(x_{22},x_{21}) & k_{22}(x_{22},x_{22}) \end{bmatrix} \end{bmatrix}

I’d say possibly, but that is dependent on the size/difficulty of the problem, and unless you can impose a parametric structure (like the identity, or kernel for within-blocks) or something else that constrains that in a way that makes sense you are left with estimating each parameter separately – this is what I did, although it becomes a quite large number even for moderate number of blocks.

That’s a great question I could never figure out completely. Because you need covariance matrices to be positive definite to be able to compute the likelihood if the proposed values are not it would result in an invalid likelihood value for the proposal and it would be rejected. For Metropolis-Hastings that would probably just slow down things when the random proposals ended up in that region, but for HMC where the proposal relies on navigating the geometry of the likelihood surface its combination with positive definiteness criterion is harder to visualize (to me at least); if it turns out to be a sharp boundary may the positive definiteness would just fence off a region of parameter space and I guess HMC trajectories would navigate within that boundary, but if it happens to be a series of pits with likelihood zero it could cut short otherwise valid trajectories and make it very inefficient.

Maybe others here will have a better intuition for what this kind of constraint entails for the HMC potential and the actual performance. In practice I would just run it and see what the samples look like and go from there, all of that may be trying to solve a problem that is not there.

Thank you, Caesoma! I figured out a way to estimate this special correlation structure, and I was able to implement it in R. Now I wanted to incorporate this part into the Stan code. I am not sure whether my replacement of runiform (0, 1) in the original R code with the “nu” parameter is proper. Also, the estimation of R from the following stan code is not as good as with my previous R code. Do you know what might caused the problem?

transformed parameters{
  corr_matrix[sum(K)] R;
  cov_matrix[sum(K)] VCOV;

  R = rep_matrix(0, sum(K), sum(K));
  for (m in 1:sum(K)){
    R[m, m] = 1;

  // between-block correlations
    matrix[sum(K), sum(K)] B_R;
    matrix[sum(K), sum(K)] R_N;
    matrix[sum(K), sum(K)] R_temp;
    matrix[sum(K), sum(K)] R_temp_C;
    matrix[sum(K), sum(K)] R_temp_N;

    real f_1;
    real f_minus1;
    real f_0;
    real a;
    real b;
    real c;
    real root1;
    real root2;
    real l_q1q2;
    real u_q1q2;
    real r_q1q2_N;
    real l_q1q21;
    real l_q1q22;
    real u_q1q21;
    real u_q1q22;
    real alpha_temp;
    real r_q1q2_C;
    real alpha_q1q2;
    real C_R;
    real ratio_N_C;
    real ratio_q1q2;

    R_N = R;
    B_R = rep_matrix(0, sum(K), sum(K));
    for (i in 1:N){
      B_R = B_R + inverse(diag_matrix(D)) * alpha[i] * (alpha[i]') * (inverse(diag_matrix(D))'); 

    for (m in 1:K[1]){
      for (n in (K[1]+1): sum(K)){
        R_temp = R_N;
        R_temp[m, n] = 1;
        R_temp[n, m] = 1;
        f_1 = determinant(R_temp);
        R_temp[m, n] = -1;
        R_temp[n, m] = -1;
        f_minus1 = determinant(R_temp);
        R_temp[m, n] = 0;
        R_temp[n, m] = 0;
        f_0 = determinant(R_temp);
        a = (f_1 + f_minus1 - 2 * f_0) / 2; 
        b = (f_1 - f_minus1) / 2;
        c = f_0;

        if (a < 0){
          root1 = (-b + sqrt(b^2-4*a*c))/2/a;
          root2 = (-b - sqrt(b^2-4*a*c))/2/a;

          if (root1 > -1) l_q1q2 = root1; 
          else l_q1q2 = -1;

          l_q1q2 = ceil(l_q1q2*100)/100;

          if (root2 < 1) u_q1q2 = root2;
          else u_q1q2 = 1; 

          u_q1q2 = floor(u_q1q2*100)/100;
          r_q1q2_N = l_q1q2 + (u_q1q2 - l_q1q2) * nu; // nu is a random value between 0 and 1

        if (a > 0){
          if ((b^2-4*a*c) < 0){
            r_q1q2_N = -1 + 2 * nu;  // nu is a random value between 0 and 1
          if ((b^2-4*a*c) >= 0){
            root1 = (-b-sqrt(b^2-4*a*c))/2/a;
            root2 = (-b+sqrt(b^2-4*a*c))/2/a;
            l_q1q21 = -1;
            if (root1 > -1) u_q1q21 = root1; 
            else u_q1q21 = -1;

            u_q1q21 = floor(u_q1q21*100)/100;

            if (root2 < 1) l_q1q22 = root2; 
            else l_q1q22 = 1;

            l_q1q22 = ceil(l_q1q22*100)/100;
            u_q1q22 = 1;
            alpha_temp = nu; // nu is a random value between 0 and 1

            if (alpha_temp <= (u_q1q21-l_q1q21) / (u_q1q21-l_q1q21+u_q1q22-l_q1q22)) 
              r_q1q2_N = l_q1q21 + (u_q1q21 - l_q1q21 + u_q1q22-l_q1q22) * alpha_temp;
            if (alpha_temp > (u_q1q21-l_q1q21) / (u_q1q21-l_q1q21+u_q1q22-l_q1q22))


        R_temp_C = R_N;
        R_temp_N = R_N;
        R_temp_N[m, n] = r_q1q2_N;
        R_temp_N[n, m]= R_temp_N[m, n];
        r_q1q2_C = R_temp_C[m, n]; 
        alpha_q1q2 = nu; // nu is a random value between 0 and 1

        C_R = 1000^.5; 
        ratio_N_C = exp((log(determinant(R_temp_N)) - log(determinant(R_temp_C))) * (-N/2) - 1/2 * sum(diagonal(inverse(R_temp_N) * B_R)) + 1/2 * sum(diagonal(inverse(R_temp_C) * B_R))-(r_q1q2_N)^2/(C_R^2)/2+(r_q1q2_C)^2/(C_R^2)/2);
        if (ratio_N_C < 1) ratio_q1q2 = ratio_N_C; 
        else ratio_q1q2 = 1;

        if(ratio_q1q2 > alpha_q1q2){
          R_N[m, n] = r_q1q2_N;
          R_N[n, m] = R_N[m, n];


  R = R_N;

  VCOV = quad_form_diag(R, D);

model {
  nu ~ uniform(0, 1); // nu is the random value for the estimation of correlation matrix R
  for(n in 1:N){
    y[n] ~ multi_normal(zero_k, VCOV);

I’m not sure about the whole implementation and what all parameters are in comparison with your previous R implementation, but if runiform(0,1) is generating a random value to generate a model prediction it is not the same of having nu ~ uniform(0,1) – that is estimating a fixed value given a uniform prior; I think generating random values within the model block is not possible in Stan.
I can’t be sure if that’s what you were doing without seeing how you were estimating it in R. Assuming the implementation is correct it’s likely that HMC performance is better than MLE or Metropolis-Hastings, but you’d have to be more specific about what you think is worse, look at the traces, convergence, etc.