Partial-pooling of correlation (or covariance) matrices?

I’m working on a model with a nested hierarchy and it’s straightforward to apply partial-pooling over things like location and scale parameters, but is there a way to apply the partial-pooling concept to correlation matrices too? I’m guessing it would actually have to be implemented in terms of covariance instead of correlations, but beyond that I have no intuitions.

1 Like

There has been some success in situations like this if you make the relevant correlation matrix a convex combination of a global correlation matrix and a deviation correlation matrix and estimate the weight on the global correlation matrix.

3 Likes
2 Likes

Posting here because it seems like my question is essentially the same, and there might have been updates in the year since this topic went cold.Is there an established way to deal with correlation matrices in nested hierarchical models? Alternatively, what are the pros and cons of the different possibilities? I cannot seem to find this explained in any of the stan manuals, or BDA3 for that matter.

I’m building a 3-level (individual/country/world) nested model describing an implementation of prospect theory. Each individual has 9 parameters describing their choices (think optimism, valuation of money, and loss aversion), and ideally I want to implement a correlation matrix over those parameters at the country level plus some way of pooling the information from those correlations on the world level.

For @bgoodri’s approach of the convex combination, does that largely equate to estimating an individual correlation matrix for each country? Is the difference in using the weight of the group-specific deviation matrix to encourage the deviation to be small? (as you mentioned here)

As for @Charles_Driver’s approach, I came across your discussion with spinkney and Stephen_Martin in Hierarchical Correlation? from earlier this year, but I’m not familiar enough with the Stan functions yet to really understand what’s happening in the code. Is there somewhere other than the article where it’s elaborated a bit more?

My model should be fine-ish with either a single world-level correlation matrix, or separate ones for each country, but both feel like leaving useful information on the table.

The main update from my side is a) the approach in the paper ‘works’ but turns out to have local minima issues in higher dimensions. b) to get around this I rolled a complete hack job that looks horrendous and has no theoretical justification but produces clean results on all the tests I could imagine in a fairly short timeframe, and still satisfies the requirements of 1 parameter per correlation and index independent priors. To get a correlation matrix from this function you pass it the vector of ‘unconstrained’ correlation parameters and dimension, then tcrossprod the matrix it returns. Writing some of these explorations up properly one day is vaguely on my to-do list, but quite a long way down unfortunately ;)

matrix constraincorsqrt(vector rawcor, int d){ //converts from unconstrained lower tri vec to cor sqrt
  int counter = 0;
  matrix[d,d] o;
  vector[d] ss = rep_vector(0,d);
  vector[d] s = rep_vector(0,d);
  real r;
  real r3;
  real r4;
  real r1;
  real r2;

  for(i in 1:d){ //set upper tri to lower
  for(j in 1:d){
    if(j > i){
      counter+=1;
      o[j,i] =  rawcor[counter];//inv_logit(rawcor[counter])*2-1; //divide by i for approx whole matrix equiv priors
    }
  }
  }

  for(i in 1:d){
    for(j in 1:d){
      if(j > i) {
        ss[i] +=square(o[j,i]);
        s[i] +=o[j,i];
      }
      if(j < i){
        ss[i] += square(o[i,j]);
        s[i] += o[i,j];
      }
    }
    s[i]+=1e-5;
    ss[i]+=1e-5;
  }


  for(i in 1:d){
    o[i,i]=0;
    r1=sqrt(ss[i]);
    r2=s[i];

    r3=(fabs(r2))/(r1)-1;
    r4=sqrt(log1p_exp(2*(fabs(r2)-r2-1)-4));
    r=(r4*((r3))+1)*r4+1;
    r=(sqrt(ss[i]+r));
    for(j in 1:d){
      if(j > i)  o[i,j]=o[j,i]/r;
      if(j < i) o[i,j] = o[i,j] /r;
    }
    o[i,i]=sqrt(1-sum(square(o[i,]))+1e-5);
  }

  return o;
  }
2 Likes

Thanks a lot!

I’ll definitely try and implement that.

I’d speculate that it is usually better to form hierarchical estimates of covariance matrices and calculate correlations only for presentational purposes, because pooling correlations bakes subgroup variance estimates into the process.

Correlations rescale covariances by the square roots of variances. If in some subgroup there is little evidence of variation in one variable then it’s covariance estimates within that subgroup will be small, noisy estimates close to 0 while its correlations will be noisy estimates at least on the same scale as other correlations*. The outcome we want is that the population estimate carry a high weight in this situation, which is given by pooling covariance but not by pooling correlations.

Obviously this argument is strongest when measurements share a natural scale. However, even if they do not I think I would prefer to make explicit estimates of suitable scales either for the population or hierarchically and use those to rescale the covariances before pooling. I cannot think of any situation in which it makes sense to pool co-movement information across levels but it does not make at least as much sense to pool scale information, which is I think what would make pooling correlations better?

* For some kinds of lack of evidence such as missing values they will tend to be larger.

Are you not confusing variances and covariances here?

I don’t think I’m making exactly that mistake because I don’t think we can have evidence for covariance without also having evidence for variance. However, I do rather regret making these comments without first running a simulation, and will do so.