 # Working out an SEM-style alternative to the multivariate normal

Having encountered some restrictions of the multivariate normal that my data do not support, I’ve been looking at alternatives. While the previous link describes an efficient “pairwise” approach, it’s rather non-generative in structure, so I’ve more recently been playing with an SEM approach (with the vague feeling that it might work out to being formally equivalent to my pairwise approach, with the pairwise formulation reflecting a “sufficient statistics”-shortcut) but am encountering some conceptual blocks. So I’m hoping to get folks’ feedback on what I have so far.

First off, I’m definitely having trouble working out how to do things in a non-hierarchical context, so the below will all talk of the hierarchical case. Feel free to provide input on how to transform things to non-hierarchical if you have any insights there.

To establish our terminology, imagine multiple kinds of measurement that occur together in multiple sets such that there may be correlations among the measurement kinds across sets. Furthermore, with a combination of set and kind we can have multiple replicates. A typical (non-centered) hierarchical multivariate-normal model would then be:

``````#include helper_functions.stan //defines: flatten_lower_tri
data{
int<lower=2> ns ; //number of sets
int<lower=2> nk ; //number of kinds
int<lower=2> nr ; //number of replicates
matrix[nr,nk] y[ns] ; // observations
}
parameters{
real<lower=0> noise ; //measurement noise
row_vector[nk] m ; //across-set mean per kind
vector<lower=0>[nk] s ; //across-set sd per kind
cholesky_factor_corr[nk] r_ ; //correlations (on cholesky-factor scale) among kinds
matrix[nk,ns] z_ ; //helper variable for non-centered parameterization of z (see transformed parameters)
}
transformed parameters{
// z: latent value for each set-by-kind combination, non-centered-parameterized
matrix[ns,nk] z = rep_matrix(m,ns) + transpose(diag_pre_multiply(s,r_)*z_) ;
}
model{
//priors
noise ~ weibull(2,1) ; //prior on measurement noise
m ~ std_normal() ; //prior on means
s ~ weibull(2,1) ; //prior on sds
r_ ~ lkj_corr_cholesky(1) ; //prior on correlations
//mid-level structure (see also transformed parameters)
to_vector(z_) ~ std_normal() ; //implies z~mvn(m,s,r)
//observation-level structure (there are more efficient ways to compute this, but expressed as below for clarity)
for(i_ns in 1:ns){
for(i_nk in 1:nk){
y[i_ns][,i_nk] ~ normal(z[i_ns][i_nk],noise) ;
}
}
}
generated quantities{
// extract the implied correlations
vector[(nk*(nk-1))%/%2] r = flatten_lower_tri(multiply_lower_tri_self_transpose(r_));
}
``````

Now that model handles arbitrary numbers of kinds, but to build an SEM equivalent, let’s first consider the case of `nk=2`, in which case a neat parameterization is:

``````data{
int<lower=2> ns ; //number of sets
int<lower=2> nr ; //number of replicates
matrix[nr,2] y[ns] ; //observations
}
parameters{
real<lower=0> noise ; //measurement noise
row_vector m ; //across-set mean per kind
vector<lower=0> s ; //across-set sd per kind
real<lower=-1,upper=1> r ; // correlation
vector[ns] c ; // common latent
matrix[ns,2] u ; // unique latents
}
transformed parameters{
matrix[ns,2] z ;
real rsq = pow(r,2) ;
z[,1] = m + s * ( c*rsq + u[,1]*(1-rsq) ) ; // c always weights positively on the first outcome
if(r>0){
//positive correlation case
z[,2] = m + s * ( c*rsq + u[,2]*(1-rsq) ) ;
}else{
//negative correlation case
z[,2] = m + s * ( -c*rsq + u[,2]*(1-rsq) ) ;
}
}
model{
//priors
noise ~ weibull(2,1) ; //prior on measurement noise here
m ~ std_normal() ; //prior on means here
s ~ weibull(2,1) ; //prior on SDs here
r ~ uniform(-1,1) ; //prior on correlation here
//mid-level structure
to_vector(u) ~ std_normal() ; // u must have std_normal
to_vector(c) ~ std_normal() ; // c must have std_normal
//observation-level structure
for(i_n in 1:ns){
y[i_n][,1] ~ normal(z[i_n,1],noise) ;
y[i_n][,2] ~ normal(z[i_n,2],noise) ;
}
}
``````

In that formulation, we have an explicit parameter for the correlation between the two kinds of measurement, and encode the correlation structure by modelling the latent vectors in `z` as simplex-weighted combinations of a latent variable common to both measurements and a latent variable unique to each kind.

I’m a bit stuck however on generalizing the approach to `nk>2` cases. I have something that seems to “work” in the sense that it yields no obvious sampling issues and has outputs that seem to correspond to true values used to generate data, but I had to abandon the direct parameterization of correlations and I’m still not quite sure if there are alternative identification approaches that would make more sense. Here it is:

``````data{
int<lower=2> ns ; //number of sets
int<lower=2> nk ; //number of kinds
int<lower=2> nr ; //number of replicates
matrix[nr,nk] y[ns] ; // observations
}
transformed data{
int npk = (nk * (nk - 1)) %/% 2; //num pairs of kinds
}
parameters{
real<lower=0> noise ; //measurement noise
vector[nk] m ; //outcome means
vector<lower=0>[nk] s ; //outcome sds
vector[npk] wc ; //common weights
matrix[ns,npk] c ; //common latents
matrix[ns,nk] u ; //unique latents
}
transformed parameters{
//construct weight-matrix
matrix[nk,nk] w = diag_matrix(rep_vector(1,nk)) ; //implies uniques have weight of 1
//fill tris
{int i_npk = 0 ;
for(w_col in 1:(nk-1)){
for(w_row in (w_col+1):nk){
i_npk += 1 ;
w[w_row,w_col] = fabs(wc[i_npk]) ; //lower-tri always positive
w[w_col,w_row] = wc[i_npk] ; //upper-tri retains sign
}
}}

//construct value-array_of_matrices
matrix[nk,nk] v[ns] ;
for(i_ns in 1:ns){
v[i_ns] = diag_matrix(transpose(u[i_ns])) ;
//fill tris
int i_npk = 0 ;
for(v_col in 1:(nk-1)){
for(v_row in (v_col+1):nk){
i_npk += 1 ;
v[i_ns][v_row,v_col] = c[i_ns][i_npk] ;
v[i_ns][v_col,v_row] = c[i_ns][i_npk] ;
}
}
}

//construct z
vector[nk] z[ns] ;
for(i_ns in 1:ns){
matrix[nk,nk] wv = w .* v[i_ns] ;
for(i_nk in 1:nk){
z[i_ns][i_nk] = m[i_nk] + s[i_nk]*sum(wv[,i_nk]) ;
}
}
}
model{
//priors
noise ~ weibull(2,1) ; //prior on measurement noise
m ~ std_normal() ; //prior on means
s ~ weibull(2,1) ; //prior on sds
to_vector(wc) ~ std_normal() ; //prior on common weights
//mid-level structure (see also transformed parameters)
to_vector(u) ~ std_normal() ; //u must have std_normal
to_vector(c) ~ std_normal() ; //c must have std_normal
//observation-level structure
for(i_ns in 1:ns){
for(i_nk in 1:nk){
y[i_ns][,i_nk] ~ normal(z[i_ns][i_nk],noise) ;
}
}
}
generated quantities{
// extract the implied correlations
vector[npk] r = pow(wc,2)./(1+pow(wc,2)) ;
for(i_npk in 1:npk){
if(wc[i_npk]<0){
r[i_npk] = -r[i_npk] ;
}
}
}
``````

Note that the GQ section attempts to derive the implied correlations, and while it works for `nk=2`, it doesn’t give the right values for `nk>2`.

Note also that the `s` from this latter doesn’t actually encode the posterior on the SDs as intended.

But as I say, the above seems to work, I just need to do some extra operations to get the correct transforms to `r` & `s`. That said, I’m hoping someone might have a suggestion as to how to re-parameterize for inference on those quantities more directly. I tried a version where the columns of `wv` are transformed to sum to 1, but that had divergence issues. I also tried to have the weights associated with the unique latents free (in the above they are fixed at `1.0`), but that again seemed to have sampling trouble.

Any input would be appreciated!

Oh, and here’s R code to generate and sample:

``````ns = 1e2
nk = 3
nr = 1e1
set.seed(1)
z = MASS::mvrnorm(
n = ns
, mu = rep(0,nk)
, Sigma = matrix(
c(  1,-.9,.4,
-.9,  1, 0,
.4,  0, 1)
,3,3
)
, empirical = T
)

round(cor(z),floor(-log10(10*.Machine\$double.eps)))

y = list()
for(i_ns in 1:ns){
y[[i_ns]] = matrix(NA,nr,nk)
for(i_nk in 1:nk){
y[[i_ns]][,i_nk] = rnorm(r,z[i_ns,i_nk],1)
}
}
mod = cmdstanr::cmdstan_model(
'stan/sem.stan'
, include = 'stan'
)
fit = mod\$sample(
data = lst(ns=ns,nk=nk,nr=nr,y)
, chains = 4
, parallel_chains = 4
, refresh = 200
# , iter_warmup=1e3
)
fit\$summary(c('noise','m','s','r'))
``````

And the contents of `helper_functions.stan`:

``````functions{
// flatten_lower_tri: function that returns the lower-tri of a matrix, flattened to a vector
vector flatten_lower_tri(matrix mat) {
int n_cols = cols(mat);
int n_uniq = (n_cols * (n_cols - 1)) %/% 2;
vector[n_uniq] out ;
int i = 1;
for(c in 1:(n_cols-1)){
for(r in (c+1):n_cols){
out[i] = mat[r,c];
i += 1;
}
}
return(out) ;
}
}
``````
3 Likes

This is cool! I am working up to building a SEM-like model too. I am a bit behind the curve coding things up. So hopefully as the summer progress I might have some ideas here. I work in ecology so we have have similar issues with repeat measures, multiple measures, sets, etc.

1 Like

Writing out a vine copula would be interesting to test. For this simulation you create a 3 variable latent marginal model with 3 bivariate gaussian copulas. I’m not sure of anyone writing vine copulas in Stan so there are no examples.

There’s a post about vine copulas at

and the bivariate normal copula is

`````` /**
* Normal copula log density vectorized
*
* @copyright Sean Pinkney, 2021
*
* Meyer, Christian. "The Bivariate Normal Copula." \n
* arXiv preprint arXiv:0912.2816 (2009). Eqn 3.3. \n
* accessed Feb. 6, 2021.
*
* @param u Real number on (0,1], not checked but function will return NaN
* @param v Real number on (0,1], not checked but function will return NaN
* @param rho Real number (-1, 1)
* @return log density
*/
real normal_copula_vector_lpdf(vector u, vector v, real rho){
int N = num_elements(u);
real rho_sq = square(rho);

real a1 = 0.5 * rho;
real a2 = rho_sq - 1;
real a3 = 0.5 * log1m(rho_sq);
real x = -2 * u' * v + rho * (dot_self(u) + dot_self(v));

return a1 * x / a2 - N * a3;
}
``````
1 Like

Trying to get up to speed on vine copulas; so far they look like they’re synonymous with SEM?? At least, a saturated SEM?