Partial-pooling of correlation (or covariance) matrices?

mike-lawrence · October 7, 2020, 4:53pm

I’m working on a model with a nested hierarchy and it’s straightforward to apply partial-pooling over things like location and scale parameters, but is there a way to apply the partial-pooling concept to correlation matrices too? I’m guessing it would actually have to be implemented in terms of covariance instead of correlations, but beyond that I have no intuitions.

bgoodri · October 8, 2020, 1:55am

There has been some success in situations like this if you make the relevant correlation matrix a convex combination of a global correlation matrix and a deviation correlation matrix and estimate the weight on the global correlation matrix.

Charles_Driver · October 8, 2020, 10:33am

BenH · December 22, 2021, 10:21pm

Posting here because it seems like my question is essentially the same, and there might have been updates in the year since this topic went cold.Is there an established way to deal with correlation matrices in nested hierarchical models? Alternatively, what are the pros and cons of the different possibilities? I cannot seem to find this explained in any of the stan manuals, or BDA3 for that matter.

I’m building a 3-level (individual/country/world) nested model describing an implementation of prospect theory. Each individual has 9 parameters describing their choices (think optimism, valuation of money, and loss aversion), and ideally I want to implement a correlation matrix over those parameters at the country level plus some way of pooling the information from those correlations on the world level.

For @bgoodri’s approach of the convex combination, does that largely equate to estimating an individual correlation matrix for each country? Is the difference in using the weight of the group-specific deviation matrix to encourage the deviation to be small? (as you mentioned here)

As for @Charles_Driver’s approach, I came across your discussion with spinkney and Stephen_Martin in Hierarchical Correlation? from earlier this year, but I’m not familiar enough with the Stan functions yet to really understand what’s happening in the code. Is there somewhere other than the article where it’s elaborated a bit more?

My model should be fine-ish with either a single world-level correlation matrix, or separate ones for each country, but both feel like leaving useful information on the table.

Charles_Driver · December 23, 2021, 10:19am

The main update from my side is a) the approach in the paper ‘works’ but turns out to have local minima issues in higher dimensions. b) to get around this I rolled a complete hack job that looks horrendous and has no theoretical justification but produces clean results on all the tests I could imagine in a fairly short timeframe, and still satisfies the requirements of 1 parameter per correlation and index independent priors. To get a correlation matrix from this function you pass it the vector of ‘unconstrained’ correlation parameters and dimension, then tcrossprod the matrix it returns. Writing some of these explorations up properly one day is vaguely on my to-do list, but quite a long way down unfortunately ;)

matrix constraincorsqrt(vector rawcor, int d){ //converts from unconstrained lower tri vec to cor sqrt
  int counter = 0;
  matrix[d,d] o;
  vector[d] ss = rep_vector(0,d);
  vector[d] s = rep_vector(0,d);
  real r;
  real r3;
  real r4;
  real r1;
  real r2;

  for(i in 1:d){ //set upper tri to lower
  for(j in 1:d){
    if(j > i){
      counter+=1;
      o[j,i] =  rawcor[counter];//inv_logit(rawcor[counter])*2-1; //divide by i for approx whole matrix equiv priors
    }
  }
  }

  for(i in 1:d){
    for(j in 1:d){
      if(j > i) {
        ss[i] +=square(o[j,i]);
        s[i] +=o[j,i];
      }
      if(j < i){
        ss[i] += square(o[i,j]);
        s[i] += o[i,j];
      }
    }
    s[i]+=1e-5;
    ss[i]+=1e-5;
  }


  for(i in 1:d){
    o[i,i]=0;
    r1=sqrt(ss[i]);
    r2=s[i];

    r3=(fabs(r2))/(r1)-1;
    r4=sqrt(log1p_exp(2*(fabs(r2)-r2-1)-4));
    r=(r4*((r3))+1)*r4+1;
    r=(sqrt(ss[i]+r));
    for(j in 1:d){
      if(j > i)  o[i,j]=o[j,i]/r;
      if(j < i) o[i,j] = o[i,j] /r;
    }
    o[i,i]=sqrt(1-sum(square(o[i,]))+1e-5);
  }

  return o;
  }

BenH · December 29, 2021, 4:29pm

Thanks a lot!

I’ll definitely try and implement that.

rmac · February 28, 2022, 12:07pm

I’d speculate that it is usually better to form hierarchical estimates of covariance matrices and calculate correlations only for presentational purposes, because pooling correlations bakes subgroup variance estimates into the process.

Correlations rescale covariances by the square roots of variances. If in some subgroup there is little evidence of variation in one variable then it’s covariance estimates within that subgroup will be small, noisy estimates close to 0 while its correlations will be noisy estimates at least on the same scale as other correlations*. The outcome we want is that the population estimate carry a high weight in this situation, which is given by pooling covariance but not by pooling correlations.

Obviously this argument is strongest when measurements share a natural scale. However, even if they do not I think I would prefer to make explicit estimates of suitable scales either for the population or hierarchically and use those to rescale the covariances before pooling. I cannot think of any situation in which it makes sense to pool co-movement information across levels but it does not make at least as much sense to pool scale information, which is I think what would make pooling correlations better?

* For some kinds of lack of evidence such as missing values they will tend to be larger.

mike-lawrence · February 28, 2022, 12:51pm

Are you not confusing variances and covariances here?

rmac · March 1, 2022, 3:11pm

I don’t think I’m making exactly that mistake because I don’t think we can have evidence for covariance without also having evidence for variance. However, I do rather regret making these comments without first running a simulation, and will do so.

Topic		Replies	Views
Hierarchical prior (for partial pooling) on correlation matrices? General	28	3905	March 8, 2022
Covariance matrix with more hierarchy levels Modeling	4	1890	May 31, 2017
Hierarchical Correlation? Modeling	42	4327	April 6, 2021
Improving speed in hierarchical VAR modelling Modeling	0	369	August 22, 2019
Few-but-persistent divergences in hierarchical models with partially-pooled variances Modeling	9	696	March 30, 2022

Partial-pooling of correlation (or covariance) matrices?

Related topics