# Computing correlations from the posterior

I have a fairly standard hierarchical model, multiple observations made in each of multiple conditions specified by a set of experimental manipulations made within each of multiple human participants. I explicitly model the correlation matrix describing variability in how the manipulations affect each participant, but I have a lot of manipulations so, as explained to me in this post, I observe posterior distributions for these correlations to all roughly centered very near zero. I have pretty confident expectation for some of the correlations to be non-zero with decent magnitude (up to .5-ish) so I thought I’d check the sample-by-sample correlations amongst the actual values sampled for each participant. That is, for each sample from the posterior, the hierarchical model provides estimates for each participant for the effect of a given manipulation, so for a given pair of manipulations I can compute the correlation for that sample, then repeat for all samples, yielding a posterior distribution on the correlation. When I do this, I obtain posterior distributions on the correlations that very much match my expectations. The question is: is this a valid approach? Or, possibly more precisely, what (if any) inference can be drawn from these computed-from-the-posterior correlations since they seem to diverge from the explicit correlations with which the model was parameterized?

In case it helps, my model is below. I’m referring to a difference in the posterior on the `cors` parameter compared to when I compute the correlations in the posterior amongst the columns in the `Sdevs` parameter.

``````data {
// nY: num trials
int<lower=1> nY ;
// Y: Y outcomes (should be scaled to mean=0,sd=1 for priors to make sense!)
vector[nY] Y ;
// nS: num subjects
int<lower=1> nS ;
// S: trial-by-trial subject labels for observations
int<lower=1,upper=nS> S[nY] ;
// rowsW: num rows in W
int<lower=1> rowsW ;
// nW: num within predictors
int<lower=1> nW ;
// W: within predictor matrix (unique entries only)
matrix[rowsW,nW] W ;
// indexW: index of each subject in the flattened W
int indexWSubj[nY] ;
// nB: num group predictors
int<lower=1> nB ;
// B: between predictors
matrix[nS,nB] B ;
}
parameters {
// multi_normal_helper: a helper variable
matrix[nW,nS] multi_normal_helper ;
// sds: population-level sds (on the z-scale) for each within-subject predictor
vector<lower=0>[nW] sds ;
// cor_helper: population-level correlations (on cholesky factor scale) amongst within-subject predictors
cholesky_factor_corr[nW] cor_helper ;
// coef: coefficientsfor between and within subject predictors
matrix[nB,nW] coef ;
// noise: measurement noise
real noise ;
}
transformed parameters{
// Sdevs: subject-by-subject deviations
matrix[nS,nW] Sdevs ;
// compute
Sdevs = transpose(diag_pre_multiply(sds,cor_helper) * multi_normal_helper) ;

}
model {
// multi_normal_helper must have normal(0,1) prior for multivariate trick
to_vector(multi_normal_helper) ~ normal(0,1) ;
// flat prior on correlations
cor_helper ~ lkj_corr_cholesky(1) ;
// normal(0,1) priors on all sds
sds ~ normal(0,1) ;
// normal(0,1) priors on all coefs
to_vector(coef) ~ normal(0,1) ;
// weibull(2,1) prior on noise
noise ~ weibull(2,1) ;
{
// Svals: subject coefficient values
matrix[nS,nW] Svals ;
// mu: condition-by-subject values
matrix[rowsW,nS] mu ;
Svals = B * coef + Sdevs ;
for(i in 1:nS){
for(j in 1:rowsW){
mu[j,i] = dot_product(Svals[i],W[j]) ;
}
}
Y ~ normal( to_vector(mu)[indexWSubj] , noise ) ;
}
}
generated quantities {
// cor: population-level correlations amongst within-subject predictors
corr_matrix[nW] cor ;
cor = multiply_lower_tri_self_transpose(cor_helper) ;
}``````
2 Likes

Yes, that’s fine. You’re estimating posterior correlations between parameters using MCMC (the estimate is the sample correlation in draws from the posterior, i.e., the sample correlation of the posterior sample).

Posterior correlations among parameters don’t need to match their priors. For example, I can have a regression coefficient `beta1` for income, another `beta2` for age, and they’ll be correlated in the posterior even if I give them independent priors.

P.S. Please feel free to ping the list if questions don’t get answered after a week!

2 Likes

@Richard_Erickson (who I notice just found/liked this old thread) and to anyone else that comes across this: note that I actually worked out for myself recently that the approach I described here was not appropriate and yields overconfident predictions when proper calibration checks are run. I’ll maybe include my code demonstrating this in my upcoming stanconect talk.

3 Likes

Just to clarify, the procedure here should yield valid posterior distributions for the sample correlations (i.e. the true correlation coefficients computed over your sample of subjects). But these sample correlations are not the same thing as the population correlations from which the samples arise. Does that sound right to you?

1 Like

Good clarification of terms. Yes, precisely.

2 Likes

Oh wait, no I don’t think that’s correct either. At least, not as I intended to describe what I was doing in the OP. It would certainly be possible to derive a posterior predictive distribution on the sample correlation, but what I intended to convey in my OP was something different. Using the SUG 1.13 model as a terminological reference, it has a parameter `beta` encoding latent samples of a multivariate-normal, with correlation described by another parameter `Omega`, and then `beta` is then used to express the likelihood structure (i.e. the multivariate normal is not the bottom/data-touching structure of the hierarchy). My proposal was to ignore the direct inference supplied by `Omega` and instead look at (in each sample of the posterior) the empirical correlation matrix computed from looking at `beta` alone. Anyway, ends up being a bad idea.

1 Like