Hierarchical Correlation?

I’ve been thinking something through, but perhaps it is a nonsense, so I’d like some thoughts if possible please.

I have some series of data that I wish to find the correlation matrix for, but I would also like to apply to hierarchical modelling to it.

The data can be assumed to be given by a multinomial model i.e.

X \sim N(\mu, \Sigma)

where X, \mu and \Sigma are all vectors.

Now in STAN I can quite easily add some priors for the elements of \Sigma and then have those priors feed down a level, to create a hierarchical model. Does this even make sense in terms of correlation? I’ve convinced myself both yes and no.

Also, I only have one sample for each series of size n i.e. X_1 = \{X^0_, X^1_1, .... X^n_1\} . To produce multiple series to fit the data on, I could resample the X to produce say M series and then fit my model to those M series. Does that make sense? I’m uncertain about mixing what appears to be a bootstrap with fitting a Bayesian model.

Perhaps none of this makes sense? Thoughts gladly welcome.

It makes sense but it requires some thought to get a sensible prior.
You can see a solution I’ve used on page 10 here: https://www.researchgate.net/publication/324093594_Hierarchical_Bayesian_Continuous_Time_Dynamic_Modeling

2 Likes

Thanks - I’ll have a read of that.

That’s awesome, thanks!

I’m curious about the use of the inverse logit; did you consider Fishers’ z-to-r transform (which works out to be simply tanh() in R)? To my admittedly mathematically naive eye, it would seem less ad-hoc (designed pretty much for this purpose I think, plus yields -1 to +1 without needing any extra rescaling step).

It’s been a while but i definitely did. I assume it was the first thing I tried actually. I ‘think’ it was causing some slightly spookier behaviour when generating from the priors in higher dims, but my memory is a bit hazy and it could be the decision between the two was a bit ad-hoc.

1 Like

Would you happen to have a mwe of this? The ctsem stan code is…intimidating.

2 Likes

hah why not. The main function you needed was right near the top, but I understand ;)

library(rstan)

scor <-'
functions{

  matrix constraincorsqrt(matrix mat){ //converts from unconstrained lower tri matrix to cor sqrt
    matrix[rows(mat),cols(mat)] o;
    
    for(i in 1:rows(o)){ //constrain and set upper tri to lower
      for(j in min(i+1,rows(mat)):rows(mat)){
        o[j,i] =  inv_logit(mat[j,i])*2-1;  // can change cor prior here
        o[i,j] = o[j,i];
      }
      o[i,i]=1; 
      o[i,] = o[i,] / sqrt(sum(square(o[i,]))+1e-10);
    }
    return o;
  }

}

data{
  int d;
}
parameters{
  vector[(d*d-d)/2] rawcor;
}
transformed parameters{
  matrix[d,d] mcor;
  
  {
  int counter=0;
    for(j in 1:d){
      for(i in 1:d){
        if(i > j){
          counter += 1;
          mcor[i,j]=rawcor[counter];
        }
      }
    }
  }
  
  mcor = tcrossprod(constraincorsqrt(mcor));
}
model{
  rawcor ~ normal(0,1);
}
'

sm <- stan_model(model_code = scor)

sf<-sampling(sm,data=list(d=3))

s=summary(sf)$summary
round(s,2)
2 Likes

Thanks! I cleaned it up a little by combining the transformed parameter loop with the function loop. It’s barely faster but now you can just use it with the raw correlation parameter without writing that loop everytime.

functions{
    matrix constraincorsqrt(vector rawcor, int r){ //converts from unconstrained lower tri matrix to cor sqrt
    int counter = 0;
    matrix[r, r] o;
    
    o[1, 1] = 1;
    for(i in 2:r){ //constrain and set upper tri to lower
      for(j in 1:i - 1){
        counter += 1;
        o[i, j] = inv_logit(rawcor[counter]) * 2 - 1;  // can change cor prior here
        o[j, i] = o[i, j];
      }
      o[i, i] = 1;
    }
    
    for (i in 1:r) o[i, ] /= sqrt(sum(square(o[i, ])) + 1e-10);
    
    return o;
  }
}
data{
  int d;
}
parameters{
  vector[(d * d - d) / 2] rawcor;
}
transformed parameters{
  matrix[d, d] mcor = crossprod(constraincorsqrt(rawcor, d));
}
model{
  rawcor ~ normal(0, 1);
}
1 Like

Slightly better by taking advantage of the built in matrix operations to remove the second loop. Less readable but faster.

functions{
    matrix constraincorsqrt(vector rawcor, int r){ //converts from unconstrained lower tri matrix to cor sqrt
    int counter = 0;
    vector[r] ones = rep_vector(1, r);
    matrix[r, r] o;

    o[1, 1] = 1;
    for(i in 2:r){ //constrain and set upper tri to lower
      for(j in 1:i - 1){
        counter += 1;
        o[i, j] = inv_logit(rawcor[counter]) * 2 - 1;  // can change cor prior here
        o[j, i] = o[i, j];
      }
      o[i, i] = 1;
    }
    return diag_pre_multiply( inv( sqrt( ( columns_dot_self(o) + 1e-10 ) ) ), o);
  }
}
data{
  int d;
}
parameters{
  vector[(d * d - d) / 2] rawcor;
}
transformed parameters{
  matrix[d, d] mcor = tcrossprod(constraincorsqrt(rawcor, d));
}
model{
  rawcor ~ normal(0,1);
}
2 Likes

Nice. Yeah for various reasons I had it setup for matrix input, but in the mwe at least there’s no need.

I couldn’t see quickly where, but your code is broken. Edited out confusion: Just needed a tcrossprod and not crossprod.

1 Like

If anyone figures out a nice way to get the cholesky factor directly, instead of this dense matrix square root thing, I’d be very interested, stan still complains about non pd matrices far more often than I’d like (when cholesky decomposing) even when I construct it using this form that I think ‘should’ ensure pd.

1 Like

It works fine for me. Did you copy all of it? In each case I changed the loop and indexing.
You must make sure that the loop is what I put. I’m only traversing the lower tri.

Edit: So the crossprod in the ‘nomat’ solution needed to be tcrossprod. Updated output and attachments.

R output for the nonmatrix code (‘spinkney_scor_nomat.stan’)

> sp <- stan_model(file = "spinkney_scor_nomat.stan")
DIAGNOSTIC(S) FROM PARSER:
Info: integer division implicitly rounds to integer. Found int division: d * d - d / 2
 Positive values rounded down, negative values rounded up or down in platform-dependent way.
Info: integer division implicitly rounds to integer. Found int division: d * d - d / 2
 Positive values rounded down, negative values rounded up or down in platform-dependent way.

Info: Found int division at 'string', line 22, column 9 to column 20:
  (d * d - d) / 2
Values will be rounded towards zero. If rounding is not desired you can write
the division as
  (d * d - d) / 2.0
hash mismatch so recompiling; make sure Stan code ends with a blank line
> sp <- stan_model(file = "spinkney_scor_nomat.stan")

SAMPLING FOR MODEL 'spinkney_scor_nomat' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 2.5e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.25 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 1200 [  0%]  (Warmup)
Chain 1: Iteration:  400 / 1200 [ 33%]  (Warmup)
Chain 1: Iteration:  401 / 1200 [ 33%]  (Sampling)
Chain 1: Iteration:  800 / 1200 [ 66%]  (Sampling)
Chain 1: Iteration: 1200 / 1200 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.013312 seconds (Warm-up)
Chain 1:                0.037642 seconds (Sampling)
Chain 1:                0.050954 seconds (Total)
Chain 1: 
> round(summary(sp_f, pars = "mcor")$summary, 2)
           mean se_mean   sd  2.5%   25%   50%  75% 97.5%  n_eff Rhat
mcor[1,1]  1.00    0.00 0.00  1.00  1.00  1.00 1.00  1.00 412.32    1
mcor[1,2]  0.00    0.02 0.59 -0.93 -0.53  0.01 0.56  0.91 723.59    1
mcor[1,3]  0.01    0.02 0.57 -0.92 -0.48  0.01 0.51  0.93 762.93    1
mcor[2,1]  0.00    0.02 0.59 -0.93 -0.53  0.01 0.56  0.91 723.59    1
mcor[2,2]  1.00    0.00 0.00  1.00  1.00  1.00 1.00  1.00 453.90    1
mcor[2,3] -0.01    0.02 0.54 -0.90 -0.46 -0.01 0.43  0.92 710.43    1
mcor[3,1]  0.01    0.02 0.57 -0.92 -0.48  0.01 0.51  0.93 762.93    1
mcor[3,2] -0.01    0.02 0.54 -0.90 -0.46 -0.01 0.43  0.92 710.43    1
mcor[3,3]  1.00    0.00 0.00  1.00  1.00  1.00 1.00  1.00 439.26    1

R output for the matrix output (spinkney_scor.stan)

> sp <- stan_model(file = "spinkney_scor.stan")
DIAGNOSTIC(S) FROM PARSER:
Info: integer division implicitly rounds to integer. Found int division: d * d - d / 2
 Positive values rounded down, negative values rounded up or down in platform-dependent way.
Info: integer division implicitly rounds to integer. Found int division: d * d - d / 2
 Positive values rounded down, negative values rounded up or down in platform-dependent way.

Info: Found int division at 'string', line 40, column 9 to column 20:
  (d * d - d) / 2
Values will be rounded towards zero. If rounding is not desired you can write
the division as
  (d * d - d) / 2.0
recompiling to avoid crashing R session
> sp_f <- sampling(sp, data = list(d = 3),
+                  seed = 1234,
+                  chains = 1,
+                  iter = 1200,
+                  warmup = 400,
+                  refresh = 400)

SAMPLING FOR MODEL 'spinkney_scor' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 2.3e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.23 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 1200 [  0%]  (Warmup)
Chain 1: Iteration:  400 / 1200 [ 33%]  (Warmup)
Chain 1: Iteration:  401 / 1200 [ 33%]  (Sampling)
Chain 1: Iteration:  800 / 1200 [ 66%]  (Sampling)
Chain 1: Iteration: 1200 / 1200 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.008973 seconds (Warm-up)
Chain 1:                0.024096 seconds (Sampling)
Chain 1:                0.033069 seconds (Total)
Chain 1: 
> round(summary(sp_f, pars = "mcor")$summary, 2)
           mean se_mean   sd  2.5%   25%   50%  75% 97.5%  n_eff Rhat
mcor[1,1]  1.00    0.00 0.00  1.00  1.00  1.00 1.00  1.00 412.32    1
mcor[1,2]  0.00    0.02 0.59 -0.93 -0.53  0.01 0.56  0.91 723.59    1
mcor[1,3]  0.01    0.02 0.57 -0.92 -0.48  0.01 0.51  0.93 762.93    1
mcor[2,1]  0.00    0.02 0.59 -0.93 -0.53  0.01 0.56  0.91 723.59    1
mcor[2,2]  1.00    0.00 0.00  1.00  1.00  1.00 1.00  1.00 453.90    1
mcor[2,3] -0.01    0.02 0.54 -0.90 -0.46 -0.01 0.43  0.92 710.43    1
mcor[3,1]  0.01    0.02 0.57 -0.92 -0.48  0.01 0.51  0.93 762.93    1
mcor[3,2] -0.01    0.02 0.54 -0.90 -0.46 -0.01 0.43  0.92 710.43    1
mcor[3,3]  1.00    0.00 0.00  1.00  1.00  1.00 1.00  1.00 439.26    1 

spinkney_scor.stan (1.3 KB)
spinkney_scor_nomat.stan (700 Bytes)

1 Like

Cool yes. My vague memories of it needing to be iterative (I guess I maybe fixed / improved that in the algorithm at some point…) combined with crossprod confusion to create confusion :)

1 Like

Curious, are you referring to a case like this?

As in, you want a hierarchical correlation matrix, where there is a fixed and random component to group-specific correlation matrix?

For anyone finding their way here in the future, the approach discussed by myself and @spinkney gives consistent priors across indices of the correlation matrix and works well for generating data, and I had some success with it hierarchically, but in some circumstances – I think when the correlation signal is relatively strong – this approach seems to give difficulty with sampling and optimization, with lots of multimodality. I’ve updated the code, it performs much better and copes with higher correlations with less bias (much better than the stan lkj approach too it seems) but the downside is priors on the bivariate correlations are not exactly the same across indices of the matrix. It should be possible to correct this behaviour, but for the time being I’ll post the update with the ‘rough correction’ (divide by column index) I guesshacked. The following code compares my version to the LKJ approach, it doesn’t show the hierarchical form but that is hopefully clear enough – just recompute the correlation matrix for whatever vector of unconstrained correlation parameters are relevant. I guess @mike-lawrence will be pretty interested here.

Also, the inability to edit old posts to point them to updated info is a bit annoying…

require(rstan)

smt <- '
functions{
   matrix constraincorsqrt(vector rawcor, int d){ //converts from unconstrained lower tri matrix to cor
    matrix[d,d] o;
    real cormax=0;
    real rowsum;
    int counter = 0;
   
    for(i in 1:d){ //constrain and set upper tri to lower
      for(j in 1:d){
        if(j > i){
          counter += 1;
          o[j,i] = rawcor[counter] / i;
          o[i,j] = 0;//o[j,i];
        }
      }
      o[i,i] = 1; // smaller gives more prior weight to larger correlations, larger gives less
      o[i,] = o[i,] / sqrt(sum(square(o[i,])+1e-8));
    }
    return (o);
  }
}
data{
  int d;
  int n;
  vector[d] dat[n];
  real corpriorscale;
}
parameters{
  vector[(d * d - d) / 2] rawcor;
  vector[d] logsdvec;
  vector[d] mu;
}
transformed parameters{
  matrix[d,d] mcor;
  matrix[d, d] mcov;
  matrix[d, d] mcovchol;

  mcor = tcrossprod(constraincorsqrt(rawcor,d));
  mcov = quad_form_diag(mcor, exp(logsdvec) +1e-8);
  mcovchol = cholesky_decompose(mcov); 
}
model{
  rawcor ~ normal(0,corpriorscale);
  logsdvec ~ normal(0,10);
  mu ~ normal(0,10);
  dat ~ multi_normal_cholesky(mu,mcovchol);
}
'

sm <- stan_model(model_code = smt)


smtlkj <- '
data{
  int d;
  int n;
  vector[d] dat[n];
  real corpriorscale;
}
parameters{
  cholesky_factor_corr[d] cholcor;
  vector[d] logsdvec;
  vector[d] mu;
}
transformed parameters{
  matrix[d,d] mcor = tcrossprod(cholcor);
  matrix[d, d] mcov = quad_form_diag(mcor,exp(logsdvec));
  matrix[d, d] mcovchol = cholesky_decompose(mcov); 
}
model{
  cholcor ~ lkj_corr_cholesky(corpriorscale);
  logsdvec ~ normal(0,10);
  mu ~ normal(0,10);
  dat ~ multi_normal_cholesky(mu,mcovchol);
}
'

smlkj <- stan_model(model_code = smtlkj)



d=5
n=30
mcov <- diag(1,d)

#some high some low
mcov[lower.tri(mcov)][seq(1,(d^2-d)/2,d)] <- .8
# mcov[,1] <- mcov[,1]+3

#random
# mcov[lower.tri(mcov)]<- rnorm((d^2-d)/2)#.8

mcov[upper.tri(mcov)] <- t(mcov)[upper.tri(mcov)]
mcov <- cov2cor(mcov %*% t(mcov))
mcovchol <- t(chol(mcov))
mcor=cov2cor(mcov)
dat <- matrix(rnorm(d*n,0,1),n,d)
dat <- t(mcovchol %*% t(dat))
dat <- scale(dat)
round(cor(dat),3)
round(mcor,3)


sdat <- list(n=n,d=d, dat=dat,corpriorscale=10)
# sf <- ctsem:::stanWplot(sm,sdat,chains=4,cores=4,iter=2000)
ta<-Sys.time()
sf <- sampling(sm,sdat,chains=4,cores=4,iter=2000)
ta <- Sys.time()-ta
print(ta)
tb<-Sys.time()
# sflkj <- ctsem:::stanWplot(smlkj,sdat,chains=4,cores=4,iter=2000)
sflkj <- sampling(smlkj,sdat,chains=4,cores=4,iter=2000)
tb <- Sys.time()-tb
print(tb)

s=round(summary(sf)$summary,3)
s[grep('mcor',rownames(s)),]
slkj=round(summary(sflkj)$summary,3)
slkj[grep('mcor',rownames(slkj)),]
plot(s[grep('mcor',rownames(s)),1],pch=16,col=2,ylab='Corr.')
points(c(cor(dat)),pch=1,col='black')
# points(c(mcor),pch=16,col='blue')
points(slkj[grep('mcor',rownames(slkj)),1],pch=16,col='green')
points(s[grep('mcor',rownames(s)),4],col=2,pch=3)
points(s[grep('mcor',rownames(s)),8],col=2,pch=3)
points(slkj[grep('mcor',rownames(slkj)),4],pch=3,col='green')
points(slkj[grep('mcor',rownames(slkj)),8],pch=3,col='green')
legend('topright',c('True','CD','LKJ'),text.col=c(1,2,3),bty='n')

and in case anyone wants to do me a favour and figure out a proper solution, here is a simple script for testing:

ndim=5
mat=matrix(1,ndim,ndim)

corfunc=function(mat, invert){ 
  o=mat;
  d=nrow(o)
  for(i in 1:nrow(o)){ 
    for(j in 1:nrow(o)){
      if(j > i){
        o[j,i] = o[j,i] / (i)  #divide by i was just a rough guess that semi works
        o[i,j] = 0;
      }
    }
    o[i,i]=1 #adjust this value to modify propensity for high correlation strengths
    o[i,] = o[i,] / sqrt(sum((o[i,]^2)))
  }
  return(o)
}

m=tcrossprod(corfunc(mat,F))
mold
m
message('old mean = ',mean(mold[lower.tri(mold)]))
message('new mean = ',mean(m[lower.tri(m)]))
message('old diff = ',
sqrt(mean(( mold[lower.tri(mold)]-mean(mold[lower.tri(mold)]))^2)))
message('new diff = ',
sqrt(mean(( m[lower.tri(m)]-mean(m[lower.tri(m)]))^2)))
mold=m
1 Like

The simple script doesn’t work yet. Need to define mold

that gets defined the first time you run it, just to compare old vs new attempts while fiddling around :)

1 Like

If I do the spherical parameterization, I can get out the right “denominator” and the cholesky factor without needing the cholesky decompose.

Can you see if this does what you want?

functions{
  matrix angle_vec2mat (vector angle, int K) {
    int N = num_elements(angle);
    matrix[K, K] mat = add_diag(identity_matrix(K), rep_vector(-1, K));
    int count = K;
    int pos = 1;
    for (k in 1:K - 1){
      count -= 1;
      mat[k + 1:K, k] = segment(angle, pos, count);
      pos += count;
    }
    return mat;
  }
  
matrix angle2cor (vector angle, int N){
  matrix[N, N] angle_mat = angle_vec2mat(angle, N);
  matrix[N, N] inv_mat = identity_matrix(N);
  inv_mat[, 1] = cos(angle_mat[, 1]);
  
  for (i in 2:N) {
    inv_mat[i, i] = prod(sin(angle_mat[i, 1:(i - 1)]));
    if (i > 2) {
      for (j in 2:i - 1 ) {
        inv_mat[i, j] = cos(angle_mat[i, j]) * prod(sin(angle_mat[i, 1:(j - 1)]));
      }
    }
  }
  return inv_mat;
  }
}
data{
  int d;
  int n;
  vector[d] dat[n];
  real corpriorscale;
}
parameters{
  vector[(d * d - d) / 2] rawcor;
  vector[d] logsdvec;
  vector[d] mu;
}
transformed parameters{
  cholesky_factor_corr[d] mcor = angle2cor(rawcor,d);
}
model{
  rawcor ~ normal(0,corpriorscale);
  logsdvec ~ normal(0,10);
  mu ~ normal(0,10);
  dat ~ multi_normal_cholesky(mu, diag_pre_multiply(exp(logsdvec), mcor));
}

Here’s @Charles_Driver graph with this model

What do you guys think of the idea whereby no multivariate normal is modelled, but the target is incremented by the sampling distribution expectation of each bivariate correlation separately? That way, we can model each bivariate correlation hierarchically separately in a relatively straight-forward manner. Here’s a rough-but-compiles implementation of what I mean (the one thing I didn’t have time to work out is how to add my toggle trick for centered/non-centered parameterization, so it’s currently center-parameterized for all hierarchical quantities):

data{

	// num_subj: number of subj
	int<lower=1> num_subj ;

	// num_var: number of variables modelled as intercorrelated
	int<lower=2> num_var ;

	// num_obs: number of observations per variable and subject
	int<lower=1> num_obs ;

	// subj_obs: array of matrices, one per subject
	matrix[num_obs,num_var] subj_obs[num_subj] ;

}
transformed data{

	// num_cors: number of correlations implied by the number of variables
	int num_cors = ((num_var-1)*num_var)/2;

	// corz_se: standard error for correlations on Fisher's Z scale implied by the number of observations
	real corz_se = 1/sqrt(num_obs-3) ;

	// sqrt_num_obs: pre-computing this for use in likelihood
	real sqrt_num_obs = sqrt(num_obs) ;
	// sd_gamma_shape: pre-computing this for use in likelihood
	real sd_gamma_shape = (num_obs - 1) / 2 ;

	// compute the empirical means, SDs, variances, and correlations on Fisher's Z scale
	vector[num_var] obs_means[num_subj] ;
	vector[num_var] obs_sds[num_subj] ;
	vector[num_var] obs_variances[num_subj] ;
	vector[num_cors] obs_z[num_subj] ;
	for(i_subj in 1:num_subj){
		matrix[num_obs,num_var] this_subj_obs = subj_obs[i_subj] ; //temp variable
		for(i_var in 1:num_var){
			obs_means[i_subj][i_var] = mean(this_subj_obs[,i_var]);
			obs_sds[i_subj][i_var] = sd(this_subj_obs[,i_var]);
			obs_variances[i_subj][i_var] = sd(this_subj_obs[,i_var])^2;
		}
		int k = 0 ; //temp variable
		vector[num_cors] obs_r ; //temp variable
		// looping over lower-tri to compute correlation between each column in this_subj_obs
		for(this_var in 1:(num_var-1)){
			for(this_other_var in (this_var+1):num_var){
				k += 1 ;
				obs_r[k] = (
					sum(
						( this_subj_obs[,this_var] - obs_means[i_subj][this_var] )
						.* ( this_subj_obs[,this_other_var] - obs_means[i_subj][this_other_var] )
					)
				) / (
					(num_obs-1) * obs_sds[i_subj][this_var] * obs_sds[i_subj][this_other_var]
				) ;
			}
		}
		obs_z[i_subj] = atanh(obs_r) ;
	}
}
parameters{

	// mean_mean: mean (across subjects) mean
	vector[num_cors] mean_mean ;
	// mean_sd: sd (across subjects) mean
	vector<lower=0>[num_cors] mean_sd ;
	// mean_subj: by-subject mean
	vector[num_cors] mean_subj[num_subj] ;

	// logsd_mean: mean (across subjects) log-sd
	vector[num_cors] logsd_mean ;
	// logsd_sd: sd (across subjects) log-sd
	vector<lower=0>[num_cors] logsd_sd ;
	// sd_subj: by-subject sd
	vector<lower=0>[num_cors] sd_subj[num_subj] ;

	// corz_mean: mean (across subjects) correlations on Fisher's Z scale
	vector[num_cors] corz_mean ;
	// corz_sd: sd (across subjects) correlations on Fisher's Z scale
	vector<lower=0>[num_cors] corz_sd ;
	// corz_subj: by-subject correlations on Fisher's Z scale
	vector[num_cors] corz_subj[num_subj] ;

}
model{

	////
	// Priors
	////

	// Peaked-at-zero normal(0,1) prior on mean of means
	mean_mean ~ std_normal() ;
	// Peaked-at-zero normal(0,1) prior on sd of means
	mean_sd ~ std_normal() ;

	// Peaked-at-zero normal(0,1) prior on mean of logsds
	logsd_mean ~ std_normal() ;
	// Peaked-at-zero normal(0,1) prior on sd of logsds
	logsd_sd ~ std_normal() ;

	// normal(0,.74) priors on the mean correlations
	// yields flat prior on abs(cor)<.7, and diminishing to zero by abs(cor)==1
	// view in R via: hist(tanh(abs(rnorm(1e7)*.74)),br=1e4,xlim=c(0,1))
	corz_mean ~ normal(0,.74) ;
	// Peaked-at-zero normal(0,1) prior on variability of correlations
	// (would want to do PPC for this!)
	corz_sd ~ std_normal() ;

	// hierarchical structure:
	for(i_subj in 1:num_subj){
		mean_subj[i_subj] ~ normal(mean_mean,mean_sd) ;
		sd_subj[i_subj] ~ lognormal(logsd_mean,logsd_sd) ;
		corz_subj[i_subj] ~ normal(corz_mean,corz_sd) ;
	}

	////
	// Likelihood
	////

	for(i_subj in 1:num_subj){
		obs_means[i_subj] ~ normal( mean_subj[i_subj] , sd_subj[i_subj]/sqrt_num_obs ) ;
	    obs_variances[i_subj]  ~ gamma( sd_gamma_shape , sd_gamma_shape ./ pow(sd_subj[i_subj],2) ) ;
		obs_z[i_subj] ~ normal(corz_subj[i_subj],corz_se) ;
	}

}
generated quantities{

	// cor_mean: the upper-tri of the group-level-mean correlation matrix, flattened to a vector for efficient storage
	vector[num_cors] cor_mean = tanh(corz_mean) ;

}

Now, that’s encoding a model with independent hierarchies for everything, but it’d also be possible be possible to combine things with multivariate normal hierarchical structures, from just within-quantitiy-type (ex. currently the subject-level mean on each var is modelled as independent from their mean on the other vars, but the set of subject-level means could be treated as mvn) to doing everything as one big mvn (capturing things like subjects with a high mean on var1 also tend to have a high correlation between var2 and var3).