So I’ve been playing with some complicated models for hierarchical data and need two kinds of SEM-like structures for some latent quantities, which I’ve boiled down to the following two separate examples. First, here’s a structure whereby the two latent quantities l1
& l2
are linked by a correlation r
:
// sem_l1_l2.stan
data{
int nIds ;
int nReps ;
matrix[nIds,nReps] y1 ;
matrix[nIds,nReps] y2 ;
}
parameters{
row_vector[2] means ;
row_vector<lower=0>[2] sds ;
vector[nIds] common_stdnorm ;
matrix[nIds,2] unique_stdnorm ;
real<lower=-1,upper=1> r ;
real<lower=0> noise ;
}
transformed parameters{
// generate l1_stdnorm & l2_stdnorm such that both have unit variance and cor(l1_stdnorm,l2_stdnorm)==r
real w = sqrt(r/(1-r)) ;
vector[nIds] l1_stdnorm = (common_stdnorm*w + unique_stdnorm[,1]) / sqrt( 1+(w^2) ) ;
vector[nIds] l2_stdnorm = (common_stdnorm*w + unique_stdnorm[,2]) / sqrt( 1+(w^2) ) ;
// scale & shift stdnorms
vector[nIds] l1 = l1_stdnorm*sds[1] + means[1] ;
vector[nIds] l2 = l2_stdnorm*sds[2] + means[2] ;
}
model{
// prior for latent means/sds & manifest noise
means ~ std_normal() ; // would want to tweak this to match domain expertise
sds ~ weibull(2,1) ; // ditto
noise ~ weibull(2,1) ; // ditto
// common & unique must be ~ std_normal()
common_stdnorm ~ std_normal() ;
to_vector(unique_stdnorm) ~ std_normal() ;
// r ~ uniform(-1,1) ; // commented-out bc implied by bounds
// likelihood
for(i_nId in 1:nIds){
y1[i_nId,] ~ normal( l1[i_nId] , noise ) ;
y2[i_nId,] ~ normal( l2[i_nId] , noise ) ;
}
}
And next is a scenario where quantities l0
and the columns of l123
, are linked through the correlations in the row-vector r
:
// sem_l0_l123.stan
data{
int nIds ;
int nReps ;
matrix[nIds,nReps] y1 ;
matrix[nIds,nReps] y2 ;
matrix[nIds,nReps] y3 ;
}
parameters{
row_vector[3] means ;
row_vector<lower=0>[3] sds ;
vector[nIds] l0_stdnorm ;
matrix[nIds,3] l123_unique_stdnorm ;
row_vector<lower=-1,upper=1>[3] r ;
real<lower=0> noise ;
}
transformed parameters{
// generate l123_stdnorm such all columns have unit variance and
// cor(l0_stdnorm,l123_stdnorm[,1])==r[1]
// cor(l0_stdnorm,l123_stdnorm[,2])==r[2]
// cor(l0_stdnorm,l123_stdnorm[,3])==r[3]
row_vector[3] vars_common = r.^2 ;
row_vector[3] vars_unique = 1-vars_common ;
matrix[nIds,3] l123_stdnorm = (
(
( rep_matrix(l0_stdnorm,3) .* rep_matrix(r,nIds) )
+ ( l123_unique_stdnorm .* rep_matrix(sqrt(vars_unique),nIds) )
)
./ rep_matrix( sqrt(vars_unique+vars_common) ,nIds)
) ;
// scale & shift stdnorms
matrix[nIds,3] l123 = (
l123_stdnorm .* rep_matrix(sds,nIds)
+ rep_matrix(means,nIds)
);
}
model{
// prior for latent means/sds & manifest noise
means ~ std_normal() ; // would want to tweak this to match domain expertise
sds ~ weibull(2,1) ; // ditto
noise ~ weibull(2,1) ; // ditto
// common & unique must be ~ std_normal()
l0_stdnorm ~ std_normal() ;
to_vector(l123_unique_stdnorm) ~ std_normal() ;
// r ~ uniform(-1,1) ; // commented-out bc implied by bounds
// likelihood
for(i_nId in 1:nIds){
y1[i_nId,] ~ normal( l123[i_nId,1] , noise ) ;
y2[i_nId,] ~ normal( l123[i_nId,2] , noise ) ;
y3[i_nId,] ~ normal( l123[i_nId,3] , noise ) ;
}
}
Now, as coded, there’s going to be sampling issues with both these toy models because if the true generating process has a value for any r
that’s non-zero, the model as coded will have a bimodal likelihood with peaks at +r
& -r
.
In light of this, I know folks usually undertake one or another strategy for achieving better identifiability, and I wanted to make sure I knew all the options. I think they are:
- Don’t change the model, instead hand-process the posterior to check-for and flip one mode of any bimodal posteriors.
- For the
sem_l0_l123.stan
model, constrain one of the correlations inr
to positive-only.
Are there any other options?
I tend not to like 1 because it assumes the sampler will behave ok despite the bimodality of the posterior, which we know HMC probably won’t (but admittedly other samplers might).
I also tend not to like 2 as it really only feels like it “works” well when there’s strong domain expertise that at least one identifed correlation will be non-zero; if you don’t have such prior knowledge you can end up choosing to positive-constain a correlation that is actually zero or nearly-zero, leaving it’s posterior artificially truncated and (if it’s genuinely zero) not actually eliminating bimodality for other correlations.
p.s. I know there are tricks to marginalize-out the latent quantities like common_stdnorm
& l0_stdnorm
, which I presume also resolves the identifiability issues noted above, but I think the various other structures in my non-toy model genuinely need direct inference on those latents.