# Collecting identifiability constraint strategies for latent/hierarchical SEMs

So I’ve been playing with some complicated models for hierarchical data and need two kinds of SEM-like structures for some latent quantities, which I’ve boiled down to the following two separate examples. First, here’s a structure whereby the two latent quantities `l1` & `l2` are linked by a correlation `r`:

``````// sem_l1_l2.stan
data{
int nIds ;
int nReps ;
matrix[nIds,nReps] y1 ;
matrix[nIds,nReps] y2 ;
}
parameters{
row_vector means ;
row_vector<lower=0> sds ;
vector[nIds] common_stdnorm ;
matrix[nIds,2] unique_stdnorm ;
real<lower=-1,upper=1> r ;
real<lower=0> noise ;
}
transformed parameters{
// generate l1_stdnorm & l2_stdnorm such that both have unit variance and cor(l1_stdnorm,l2_stdnorm)==r
real w = sqrt(r/(1-r)) ;
vector[nIds] l1_stdnorm = (common_stdnorm*w + unique_stdnorm[,1]) / sqrt( 1+(w^2) ) ;
vector[nIds] l2_stdnorm = (common_stdnorm*w + unique_stdnorm[,2]) / sqrt( 1+(w^2) ) ;
// scale & shift stdnorms
vector[nIds] l1 = l1_stdnorm*sds + means ;
vector[nIds] l2 = l2_stdnorm*sds + means ;
}
model{
// prior for latent means/sds & manifest noise
means ~ std_normal() ; // would want to tweak this to match domain expertise
sds ~ weibull(2,1) ; // ditto
noise ~ weibull(2,1) ; // ditto

// common & unique must be ~ std_normal()
common_stdnorm ~ std_normal() ;
to_vector(unique_stdnorm) ~ std_normal() ;

// r ~ uniform(-1,1) ; // commented-out bc implied by bounds

// likelihood
for(i_nId in 1:nIds){
y1[i_nId,] ~ normal( l1[i_nId] , noise ) ;
y2[i_nId,] ~ normal( l2[i_nId] , noise ) ;
}

}
``````

And next is a scenario where quantities `l0` and the columns of `l123`, are linked through the correlations in the row-vector `r`:

``````// sem_l0_l123.stan
data{
int nIds ;
int nReps ;
matrix[nIds,nReps] y1 ;
matrix[nIds,nReps] y2 ;
matrix[nIds,nReps] y3 ;
}
parameters{
row_vector means ;
row_vector<lower=0> sds ;
vector[nIds] l0_stdnorm ;
matrix[nIds,3] l123_unique_stdnorm ;
row_vector<lower=-1,upper=1> r ;
real<lower=0> noise ;
}
transformed parameters{
// generate l123_stdnorm such all columns have unit variance and
//    cor(l0_stdnorm,l123_stdnorm[,1])==r
//    cor(l0_stdnorm,l123_stdnorm[,2])==r
//    cor(l0_stdnorm,l123_stdnorm[,3])==r
row_vector vars_common = r.^2 ;
row_vector vars_unique = 1-vars_common ;
matrix[nIds,3] l123_stdnorm = (
(
( rep_matrix(l0_stdnorm,3) .* rep_matrix(r,nIds) )
+ ( l123_unique_stdnorm .* rep_matrix(sqrt(vars_unique),nIds) )
)
./ rep_matrix( sqrt(vars_unique+vars_common) ,nIds)
) ;
// scale & shift stdnorms
matrix[nIds,3] l123 = (
l123_stdnorm .* rep_matrix(sds,nIds)
+ rep_matrix(means,nIds)
);
}
model{
// prior for latent means/sds & manifest noise
means ~ std_normal() ; // would want to tweak this to match domain expertise
sds ~ weibull(2,1) ; // ditto
noise ~ weibull(2,1) ; // ditto

// common & unique must be ~ std_normal()
l0_stdnorm ~ std_normal() ;
to_vector(l123_unique_stdnorm) ~ std_normal() ;

// r ~ uniform(-1,1) ; // commented-out bc implied by bounds

// likelihood
for(i_nId in 1:nIds){
y1[i_nId,] ~ normal( l123[i_nId,1] , noise ) ;
y2[i_nId,] ~ normal( l123[i_nId,2] , noise ) ;
y3[i_nId,] ~ normal( l123[i_nId,3] , noise ) ;
}

}
``````

Now, as coded, there’s going to be sampling issues with both these toy models because if the true generating process has a value for any `r` that’s non-zero, the model as coded will have a bimodal likelihood with peaks at `+r` & `-r`.

In light of this, I know folks usually undertake one or another strategy for achieving better identifiability, and I wanted to make sure I knew all the options. I think they are:

1. Don’t change the model, instead hand-process the posterior to check-for and flip one mode of any bimodal posteriors.
2. For the `sem_l0_l123.stan` model, constrain one of the correlations in `r` to positive-only.

Are there any other options?

I tend not to like 1 because it assumes the sampler will behave ok despite the bimodality of the posterior, which we know HMC probably won’t (but admittedly other samplers might).

I also tend not to like 2 as it really only feels like it “works” well when there’s strong domain expertise that at least one identifed correlation will be non-zero; if you don’t have such prior knowledge you can end up choosing to positive-constain a correlation that is actually zero or nearly-zero, leaving it’s posterior artificially truncated and (if it’s genuinely zero) not actually eliminating bimodality for other correlations.

p.s. I know there are tricks to marginalize-out the latent quantities like `common_stdnorm` & `l0_stdnorm`, which I presume also resolves the identifiability issues noted above, but I think the various other structures in my non-toy model genuinely need direct inference on those latents.

1 Like
1 Like

I think those are the main options. Related to your #2, I’ve seen many people use priors that only have support on positive numbers. The identification issue goes away, but then you can’t have posteriors that overlap with 0.

I think your #1 is tricky when there are multiple bimodal parameters, because the sign of one parameter can influence the sign of other parameters. So you wouldn’t want to flip each bimodal parameter independently of the others. Related to #2, this paper describes the need for a sign constrained parameter to achieve identification in factor analysis:

For many of these sorts of models, an identifiability issue remains even when you marginalize the latent variables.

2 Likes