Working out an SEM-style alternative to the multivariate normal

mike-lawrence · May 12, 2021, 6:08pm

Having encountered some restrictions of the multivariate normal that my data do not support, I’ve been looking at alternatives. While the previous link describes an efficient “pairwise” approach, it’s rather non-generative in structure, so I’ve more recently been playing with an SEM approach (with the vague feeling that it might work out to being formally equivalent to my pairwise approach, with the pairwise formulation reflecting a “sufficient statistics”-shortcut) but am encountering some conceptual blocks. So I’m hoping to get folks’ feedback on what I have so far.

First off, I’m definitely having trouble working out how to do things in a non-hierarchical context, so the below will all talk of the hierarchical case. Feel free to provide input on how to transform things to non-hierarchical if you have any insights there.

To establish our terminology, imagine multiple kinds of measurement that occur together in multiple sets such that there may be correlations among the measurement kinds across sets. Furthermore, with a combination of set and kind we can have multiple replicates. A typical (non-centered) hierarchical multivariate-normal model would then be:

#include helper_functions.stan //defines: flatten_lower_tri
data{
	int<lower=2> ns ; //number of sets
	int<lower=2> nk ; //number of kinds
	int<lower=2> nr ; //number of replicates
	matrix[nr,nk] y[ns] ; // observations
}
parameters{
	real<lower=0> noise ; //measurement noise
	row_vector[nk] m ; //across-set mean per kind
	vector<lower=0>[nk] s ; //across-set sd per kind
	cholesky_factor_corr[nk] r_ ; //correlations (on cholesky-factor scale) among kinds
	matrix[nk,ns] z_ ; //helper variable for non-centered parameterization of z (see transformed parameters)
}
transformed parameters{
	// z: latent value for each set-by-kind combination, non-centered-parameterized
	matrix[ns,nk] z = rep_matrix(m,ns) + transpose(diag_pre_multiply(s,r_)*z_) ;
}
model{
	//priors
	noise ~ weibull(2,1) ; //prior on measurement noise
	m ~ std_normal() ; //prior on means
	s ~ weibull(2,1) ; //prior on sds
	r_ ~ lkj_corr_cholesky(1) ; //prior on correlations
	//mid-level structure (see also transformed parameters)
	to_vector(z_) ~ std_normal() ; //implies z~mvn(m,s,r)
	//observation-level structure (there are more efficient ways to compute this, but expressed as below for clarity)
	for(i_ns in 1:ns){
		for(i_nk in 1:nk){
			y[i_ns][,i_nk] ~ normal(z[i_ns][i_nk],noise) ;
		}
	}
}
generated quantities{
	// extract the implied correlations
	vector[(nk*(nk-1))%/%2] r = flatten_lower_tri(multiply_lower_tri_self_transpose(r_));
}

Now that model handles arbitrary numbers of kinds, but to build an SEM equivalent, let’s first consider the case of nk=2, in which case a neat parameterization is:

data{
	int<lower=2> ns ; //number of sets
	int<lower=2> nr ; //number of replicates
	matrix[nr,2] y[ns] ; //observations
}
parameters{
	real<lower=0> noise ; //measurement noise
	row_vector[2] m ; //across-set mean per kind
	vector<lower=0>[2] s ; //across-set sd per kind
	real<lower=-1,upper=1> r ; // correlation
	vector[ns] c ; // common latent
	matrix[ns,2] u ; // unique latents
}
transformed parameters{
	matrix[ns,2] z ;
	real rsq = pow(r,2) ;
	z[,1] = m[1] + s[1] * ( c*rsq + u[,1]*(1-rsq) ) ; // c always weights positively on the first outcome
	if(r>0){
		//positive correlation case
		z[,2] = m[2] + s[2] * ( c*rsq + u[,2]*(1-rsq) ) ;
	}else{
		//negative correlation case
		z[,2] = m[2] + s[2] * ( -c*rsq + u[,2]*(1-rsq) ) ;
	}
}
model{
	//priors
	noise ~ weibull(2,1) ; //prior on measurement noise here
	m ~ std_normal() ; //prior on means here
	s ~ weibull(2,1) ; //prior on SDs here
	r ~ uniform(-1,1) ; //prior on correlation here
	//mid-level structure
	to_vector(u) ~ std_normal() ; // u must have std_normal
	to_vector(c) ~ std_normal() ; // c must have std_normal
	//observation-level structure
	for(i_n in 1:ns){
		y[i_n][,1] ~ normal(z[i_n,1],noise) ;
		y[i_n][,2] ~ normal(z[i_n,2],noise) ;
	}
}

In that formulation, we have an explicit parameter for the correlation between the two kinds of measurement, and encode the correlation structure by modelling the latent vectors in z as simplex-weighted combinations of a latent variable common to both measurements and a latent variable unique to each kind.

I’m a bit stuck however on generalizing the approach to nk>2 cases. I have something that seems to “work” in the sense that it yields no obvious sampling issues and has outputs that seem to correspond to true values used to generate data, but I had to abandon the direct parameterization of correlations and I’m still not quite sure if there are alternative identification approaches that would make more sense. Here it is:

data{
	int<lower=2> ns ; //number of sets
	int<lower=2> nk ; //number of kinds
	int<lower=2> nr ; //number of replicates
	matrix[nr,nk] y[ns] ; // observations
}
transformed data{
	int npk = (nk * (nk - 1)) %/% 2; //num pairs of kinds
}
parameters{
	real<lower=0> noise ; //measurement noise
	vector[nk] m ; //outcome means
	vector<lower=0>[nk] s ; //outcome sds
	vector[npk] wc ; //common weights
	matrix[ns,npk] c ; //common latents
	matrix[ns,nk] u ; //unique latents
}
transformed parameters{
	//construct weight-matrix
	matrix[nk,nk] w = diag_matrix(rep_vector(1,nk)) ; //implies uniques have weight of 1
	//fill tris
	{int i_npk = 0 ;
	for(w_col in 1:(nk-1)){
		for(w_row in (w_col+1):nk){
			i_npk += 1 ;
			w[w_row,w_col] = fabs(wc[i_npk]) ; //lower-tri always positive
			w[w_col,w_row] = wc[i_npk] ; //upper-tri retains sign
		}
	}}

	//construct value-array_of_matrices
	matrix[nk,nk] v[ns] ;
	for(i_ns in 1:ns){
		v[i_ns] = diag_matrix(transpose(u[i_ns])) ;
		//fill tris
		int i_npk = 0 ;
		for(v_col in 1:(nk-1)){
			for(v_row in (v_col+1):nk){
				i_npk += 1 ;
				v[i_ns][v_row,v_col] = c[i_ns][i_npk] ;
				v[i_ns][v_col,v_row] = c[i_ns][i_npk] ;
			}
		}
	}

	//construct z
	vector[nk] z[ns] ;
	for(i_ns in 1:ns){
		matrix[nk,nk] wv = w .* v[i_ns] ;
		for(i_nk in 1:nk){
			z[i_ns][i_nk] = m[i_nk] + s[i_nk]*sum(wv[,i_nk]) ;
		}
	}
}
model{
	//priors
	noise ~ weibull(2,1) ; //prior on measurement noise
	m ~ std_normal() ; //prior on means
	s ~ weibull(2,1) ; //prior on sds
	to_vector(wc) ~ std_normal() ; //prior on common weights
	//mid-level structure (see also transformed parameters)
	to_vector(u) ~ std_normal() ; //u must have std_normal
	to_vector(c) ~ std_normal() ; //c must have std_normal
	//observation-level structure
	for(i_ns in 1:ns){
		for(i_nk in 1:nk){
			y[i_ns][,i_nk] ~ normal(z[i_ns][i_nk],noise) ;
		}
	}
}
generated quantities{
	// extract the implied correlations
	vector[npk] r = pow(wc,2)./(1+pow(wc,2)) ;
	for(i_npk in 1:npk){
		if(wc[i_npk]<0){
			r[i_npk] = -r[i_npk] ;
		}
	}
}

Note that the GQ section attempts to derive the implied correlations, and while it works for nk=2, it doesn’t give the right values for nk>2.

Note also that the s from this latter doesn’t actually encode the posterior on the SDs as intended.

But as I say, the above seems to work, I just need to do some extra operations to get the correct transforms to r & s. That said, I’m hoping someone might have a suggestion as to how to re-parameterize for inference on those quantities more directly. I tried a version where the columns of wv are transformed to sum to 1, but that had divergence issues. I also tried to have the weights associated with the unique latents free (in the above they are fixed at 1.0), but that again seemed to have sampling trouble.

Any input would be appreciated!

Oh, and here’s R code to generate and sample:

ns = 1e2
nk = 3
nr = 1e1
set.seed(1)
z = MASS::mvrnorm(
	n = ns
	, mu = rep(0,nk)
	, Sigma = matrix(
		c(  1,-.9,.4,
			-.9,  1, 0,
			.4,  0, 1)
		,3,3
	)
	, empirical = T
)

round(cor(z),floor(-log10(10*.Machine$double.eps)))

y = list()
for(i_ns in 1:ns){
	y[[i_ns]] = matrix(NA,nr,nk)
	for(i_nk in 1:nk){
		y[[i_ns]][,i_nk] = rnorm(r,z[i_ns,i_nk],1)
	}
}
mod = cmdstanr::cmdstan_model(
	'stan/sem.stan'
	, include = 'stan'
)
fit = mod$sample(
	data = lst(ns=ns,nk=nk,nr=nr,y)
	, chains = 4
	, parallel_chains = 4
	, refresh = 200
	# , iter_warmup=1e3
)
fit$summary(c('noise','m','s','r'))

And the contents of helper_functions.stan:

functions{
	// flatten_lower_tri: function that returns the lower-tri of a matrix, flattened to a vector
	vector flatten_lower_tri(matrix mat) {
		int n_cols = cols(mat);
		int n_uniq = (n_cols * (n_cols - 1)) %/% 2;
		vector[n_uniq] out ;
		int i = 1;
		for(c in 1:(n_cols-1)){
			for(r in (c+1):n_cols){
				out[i] = mat[r,c];
				i += 1;
			}
		}
		return(out) ;
	}
}

Ara_Winter · May 21, 2021, 1:56pm

This is cool! I am working up to building a SEM-like model too. I am a bit behind the curve coding things up. So hopefully as the summer progress I might have some ideas here. I work in ecology so we have have similar issues with repeat measures, multiple measures, sets, etc.

spinkney · May 25, 2021, 7:16pm

Writing out a vine copula would be interesting to test. For this simulation you create a 3 variable latent marginal model with 3 bivariate gaussian copulas. I’m not sure of anyone writing vine copulas in Stan so there are no examples.

There’s a post about vine copulas at

and the bivariate normal copula is

 /**
   * Normal copula log density vectorized
   *
   * @copyright Sean Pinkney, 2021
   *
   * Meyer, Christian. "The Bivariate Normal Copula." \n
   * arXiv preprint arXiv:0912.2816 (2009). Eqn 3.3. \n
   * accessed Feb. 6, 2021.
   *
   * @param u Real number on (0,1], not checked but function will return NaN
   * @param v Real number on (0,1], not checked but function will return NaN
   * @param rho Real number (-1, 1)
   * @return log density
   */
real normal_copula_vector_lpdf(vector u, vector v, real rho){
   int N = num_elements(u);
   real rho_sq = square(rho);
   
   real a1 = 0.5 * rho;
   real a2 = rho_sq - 1;
   real a3 = 0.5 * log1m(rho_sq);
   real x = -2 * u' * v + rho * (dot_self(u) + dot_self(v));
  
  return a1 * x / a2 - N * a3;
}

mike-lawrence · May 28, 2021, 2:21pm

Trying to get up to speed on vine copulas; so far they look like they’re synonymous with SEM?? At least, a saturated SEM?

Topic		Replies	Views
"Pairwise" alternative to multivariate normal isn't behaving for the hierarchical case; help? Modeling techniques , specification	13	883	June 10, 2021
Hierarchical Multivariate Normal model Modeling fitting-issues , performance	6	2114	June 12, 2019
Feedback requested on possible subtle pathology of typical multivariate specification (or maybe bug?) Modeling	20	1550	March 31, 2021
Collecting identifiability constraint strategies for latent/hierarchical SEMs Modeling techniques , fitting-issues , specification	2	533	April 25, 2022
Non-centering an augmented multivariate normal Modeling specification	1	440	January 30, 2021

Working out an SEM-style alternative to the multivariate normal

Related topics