Consider the following SEM model (though the following graphic doesn’t follow all conventional SEM DAG rules, more of a dependency-graph):
Wherein a common latent weighs on two derived latents that in turn connect to two outcomes. Weights are constained as lower=-1, upper=1
(with r1 constrained to 0-1 for identifiability) and are thus interpretable as correlations and their square as quantifying the % of variability in the derived latent attributable to the common factor.
Code for this model being:
data{
int n_id ;
int n_obs ;
matrix[n_obs,n_id] y1 ;
matrix[n_obs,n_id] y2 ;
}
parameters{
real<lower=0> noise1 ;
real<lower=0> noise2 ;
real<lower=0,upper=1> r1 ;
unit_vector[2] r2 ; // for unconstrained SEM correlations, unit_vector makes for less by-hand compute
vector[n_id] common ;
vector[n_id] res1 ;
vector[n_id] res2 ;
}
model{
noise1 ~ weibull(2,1) ;
noise2 ~ weibull(2,1) ;
res1 ~ std_normal() ;
res2 ~ std_normal() ;
common ~ std_normal() ;
vector[n_id] mu1 = (
common .* r1
+ res1 .* sqrt(1-pow(r1,2)) // have to compute this term by-hand for constrained correlations
) ;
vector[n_id] mu2 = (
common .* r2[1]
+ res2 .* r2[2] // for unconstained SEM correlations, unit-vector gives us this term automatically
) ;
for(i_id in 1:n_id){
y1[,i_id] ~ normal( mu1[i_id] , noise1 ) ;
y2[,i_id] ~ normal( mu2[i_id] , noise2 ) ;
}
}
Now, consider an extension of this model wherein there are two more observed variables, with one being directly modelled by a single latent and the other being modelled as a combination of that new single latent plus the common
latent of the previous model:
With Stan code:
data{
int n_id ;
int n_obs ;
matrix[n_obs,n_id] y1 ;
matrix[n_obs,n_id] y2 ;
matrix[n_obs,n_id] y3 ;
matrix[n_obs,n_id] y4 ;
}
parameters{
real<lower=0> noise1 ;
real<lower=0> noise2 ;
real<lower=0> noise3 ;
real<lower=0> noise4 ;
real<lower=0,upper=1> r1 ;
unit_vector[2] r2 ;
vector[n_id] res1 ;
vector[n_id] res2 ;
vector[n_id] common ;
vector[n_id] mu3 ;
vector[n_id] res4 ;
unit_vector[3] r_mu4 ;
}
model{
noise1 ~ weibull(2,1) ;
noise2 ~ weibull(2,1) ;
noise3 ~ weibull(2,1) ;
noise4 ~ weibull(2,1) ;
res1 ~ std_normal() ;
res2 ~ std_normal() ;
common ~ std_normal() ;
mu3 ~ std_normal() ;
res4 ~ std_normal() ;
vector[n_id] mu1 = (
common .* r1
+ res1 .* sqrt(1-pow(r1,2))
) ;
vector[n_id] mu2 = (
common .* r2[1]
+ res2 .* r2[2]
) ;
vector[n_id] mu4 = (
common .* r_mu4[1]
+ mu3 .* r_mu4[2]
+ res4 .* r_mu4[3]
) ;
for(i_id in 1:n_id){
y1[,i_id] ~ normal( mu1[i_id] , noise1 ) ;
y2[,i_id] ~ normal( mu2[i_id] , noise2 ) ;
y3[,i_id] ~ normal( mu3[i_id] , noise3 ) ;
y4[,i_id] ~ normal( mu4[i_id] , noise4 ) ;
}
}
When the data y3
are low-noise, the model is well-identified and it’s possible to interpret r_mu4[1]^2
as the r-square for the prediction of mu4
from the common
variable alone and r_mu4[1]^2 + r_mu4[2]^2
as the r-square for prediction of mu4
using both common
and mu3
.
However, when y3
is high-noise, I think that an identifiability issue creeps in whereby mu3
begins to take on a “random noise” character and thereby can start to exchange roles with res4
in reflecting a random noise residual. For model fitting this doesn’t seem to pose much of an issue, merely inducing a negative correlation between r_mu4[2]
and r_mu4[3]
that HMC can handle, BUT it does render the above noted derivations/interpretations of r-square invalid, at least as far as I can discern. That is, if the “added value” of mu3
to the prediction of mu4
is of critical importance, when mu3
itself is not well constrained by the y3
data, mu3
will trade roles in the SEM with the residual res4
and thereby yield inflated (or at least non-meaningful) values for it’s influence on y4
.
Is this non-identifiability circumventable in any obvious way other than to collect enough y3
data to properly constrain mu3
?