Double-checking a simple single-latent-variable SEM

mike-lawrence · October 21, 2020, 10:33pm

I’m working with some data where I have N observed timeseries of M timepoints each, and I suspect that there’s a common latent variable influencing each observed timeseries to varying degrees. It’s been a while since I last worked with SEM stuff and I want to double-check that I’m going about it properly, especially with respect to ensuring identifiability. I think with this simple model, it’s sufficient to impose the restriction that the latent variable has a mean of zero, sd of 1 and fix the weight on one variable to 1, right? Here’s the code for this:

data{
	int ncols ;
	int nrows ;
	matrix[nrows,ncols] Y ; // this should be scaled to have columns with mean=0, sd=1
}
parameters{
	vector[ncols-1] weights ;
	vector<lower=0>[ncols] noise ;
	vector[nrows] latent ;
}
model{
	vector[ncols] w = append_row(1,weights) ;
	noise ~ std_normal() ;
	weights ~ std_normal() ;
	latent ~ std_normal() ;
	for(i in 1:ncols){
		Y[,i] ~ normal( latent*w[i], noise[i]) ;
	}
}

I feel like I’m missing something however. Something to do with the noise term? Since the latent variable is modelled as having an sd of 1, and it has a 1 weight on the first column, doesn’t that impose a corollary that the observed SD of that column has to be 1 or greater? I can ensure the data match this expectation by scaling it before sampling, but is this appropriate?

Edit: yeah, the more I think of it, this can’t be right as it implies that the first column is perfectly correlated with the latent variable (if I’m also standardizing all observed variables to have sd==1). Ah, I think I just need to fix the sign of the first weight, but let the magnitude remain free:

data{
	int ncols ;
	int nrows ;
	matrix[nrows,ncols] Y ;
}
parameters{
	real<lower=0> first_weight ;
	vector[ncols-1] other_weights ;
	vector<lower=0>[ncols] noise ;
	vector[nrows] latent ;
}
model{
	vector[ncols] weights = append_row(first_weight,other_weights) ;
	noise ~ std_normal() ;
	first_weight ~ std_normal() ;
	other_weights ~ std_normal() ;
	latent ~ std_normal() ;
	for(i in 1:ncols){
		Y[,i] ~ normal(
			latent*weights[i]
			, noise[i]
		) ;
	}
}

Darn, that just yields tons of divergences… Now I’m flummoxed. Anyone have any tips? @Charles_Driver maybe I could impose upon your expertise to show me where I’m going awry here?

Charles_Driver · October 22, 2020, 3:41pm

Aside from the lack of intercepts (is Y centered?) it seems roughly sensible… sorry. I remember seeing factor loadings cause some weird sampling behaviour though, if that helps ;)

mike-lawrence · October 22, 2020, 4:01pm

Thanks, good to know it’s not obviously doing something wrong. I guess from here I’ll do what I should have done in the first place: use dummy data to ensure there’s no blatant misspecification first. I’ll report back if I learn anything useful.

mike-lawrence · October 23, 2020, 4:43pm

Another vote for the degeneracy of SEMs here

edm · October 26, 2020, 3:56am

Instead of restricting the first weight to be positive, it seems to work better if you leave the weight unrestricted in the parameter block, then “fix” the signs in generated quantities. See here:

Or, if you fixed the first weight to 1, the SD of the latent distribution would then typically be free.

Topic		Replies	Views
Identifying variable influence in SEM Modeling specification	1	296	July 7, 2023
Collecting identifiability constraint strategies for latent/hierarchical SEMs Modeling techniques , fitting-issues , specification	2	463	April 25, 2022
What is the difference between y\|mi(merror) and y\|weights(1/merror)? brms	3	394	April 15, 2020
Multiple modes in latent variable model -- identifiability, or something else? Modeling	0	434	May 16, 2018
Working out an SEM-style alternative to the multivariate normal Modeling techniques , specification	3	694	May 28, 2021

Double-checking a simple single-latent-variable SEM

Related Topics