Computing correlations from the posterior

I have a fairly standard hierarchical model, multiple observations made in each of multiple conditions specified by a set of experimental manipulations made within each of multiple human participants. I explicitly model the correlation matrix describing variability in how the manipulations affect each participant, but I have a lot of manipulations so, as explained to me in this post, I observe posterior distributions for these correlations to all roughly centered very near zero. I have pretty confident expectation for some of the correlations to be non-zero with decent magnitude (up to .5-ish) so I thought I’d check the sample-by-sample correlations amongst the actual values sampled for each participant. That is, for each sample from the posterior, the hierarchical model provides estimates for each participant for the effect of a given manipulation, so for a given pair of manipulations I can compute the correlation for that sample, then repeat for all samples, yielding a posterior distribution on the correlation. When I do this, I obtain posterior distributions on the correlations that very much match my expectations. The question is: is this a valid approach? Or, possibly more precisely, what (if any) inference can be drawn from these computed-from-the-posterior correlations since they seem to diverge from the explicit correlations with which the model was parameterized?

In case it helps, my model is below. I’m referring to a difference in the posterior on the cors parameter compared to when I compute the correlations in the posterior amongst the columns in the Sdevs parameter.

data {
  // nY: num trials
  int<lower=1> nY ;
  // Y: Y outcomes (should be scaled to mean=0,sd=1 for priors to make sense!)
  vector[nY] Y ; 
  // nS: num subjects
  int<lower=1> nS ;
  // S: trial-by-trial subject labels for observations
  int<lower=1,upper=nS> S[nY] ;
  // rowsW: num rows in W
  int<lower=1> rowsW ;
  // nW: num within predictors
  int<lower=1> nW ;
  // W: within predictor matrix (unique entries only)
  matrix[rowsW,nW] W ;
  // indexW: index of each subject in the flattened W
  int indexWSubj[nY] ;
  // nB: num group predictors
  int<lower=1> nB ;
  // B: between predictors
  matrix[nS,nB] B ;
parameters {
  // multi_normal_helper: a helper variable
  matrix[nW,nS] multi_normal_helper ;
  // sds: population-level sds (on the z-scale) for each within-subject predictor
  vector<lower=0>[nW] sds ;
  // cor_helper: population-level correlations (on cholesky factor scale) amongst within-subject predictors
  cholesky_factor_corr[nW] cor_helper ;
  // coef: coefficientsfor between and within subject predictors
  matrix[nB,nW] coef ;
  // noise: measurement noise
  real noise ;
transformed parameters{
  // Sdevs: subject-by-subject deviations
  matrix[nS,nW] Sdevs ;
  // compute
  Sdevs = transpose(diag_pre_multiply(sds,cor_helper) * multi_normal_helper) ;
model {
  // multi_normal_helper must have normal(0,1) prior for multivariate trick
  to_vector(multi_normal_helper) ~ normal(0,1) ;
  // flat prior on correlations
  cor_helper ~ lkj_corr_cholesky(1) ;
  // normal(0,1) priors on all sds
  sds ~ normal(0,1) ;
  // normal(0,1) priors on all coefs
  to_vector(coef) ~ normal(0,1) ;
  // weibull(2,1) prior on noise
  noise ~ weibull(2,1) ;
    // Svals: subject coefficient values
    matrix[nS,nW] Svals ;
    // mu: condition-by-subject values
    matrix[rowsW,nS] mu ;
    Svals = B * coef + Sdevs ;
    for(i in 1:nS){
      for(j in 1:rowsW){
        mu[j,i] = dot_product(Svals[i],W[j]) ;
    Y ~ normal( to_vector(mu)[indexWSubj] , noise ) ;
generated quantities {
  // cor: population-level correlations amongst within-subject predictors
  corr_matrix[nW] cor ;
  cor = multiply_lower_tri_self_transpose(cor_helper) ;

Yes, that’s fine. You’re estimating posterior correlations between parameters using MCMC (the estimate is the sample correlation in draws from the posterior, i.e., the sample correlation of the posterior sample).

Posterior correlations among parameters don’t need to match their priors. For example, I can have a regression coefficient beta1 for income, another beta2 for age, and they’ll be correlated in the posterior even if I give them independent priors.

P.S. Please feel free to ping the list if questions don’t get answered after a week!


@Richard_Erickson (who I notice just found/liked this old thread) and to anyone else that comes across this: note that I actually worked out for myself recently that the approach I described here was not appropriate and yields overconfident predictions when proper calibration checks are run. I’ll maybe include my code demonstrating this in my upcoming stanconect talk.


Just to clarify, the procedure here should yield valid posterior distributions for the sample correlations (i.e. the true correlation coefficients computed over your sample of subjects). But these sample correlations are not the same thing as the population correlations from which the samples arise. Does that sound right to you?

1 Like

Good clarification of terms. Yes, precisely.


Oh wait, no I don’t think that’s correct either. At least, not as I intended to describe what I was doing in the OP. It would certainly be possible to derive a posterior predictive distribution on the sample correlation, but what I intended to convey in my OP was something different. Using the SUG 1.13 model as a terminological reference, it has a parameter beta encoding latent samples of a multivariate-normal, with correlation described by another parameter Omega, and then beta is then used to express the likelihood structure (i.e. the multivariate normal is not the bottom/data-touching structure of the hierarchy). My proposal was to ignore the direct inference supplied by Omega and instead look at (in each sample of the posterior) the empirical correlation matrix computed from looking at beta alone. Anyway, ends up being a bad idea.

1 Like