cv1 is the phantom factor that defines the residual covariance between x1 and x5. They have an equated factor loading called LAM. Them I created a define parameter LAM2 which is the square of LAM. This LAM2 is the residual covariance.
Now, another issue to account is that the square will always be positive, so you would have to create a check in the generated quantities to switch the sign to negative if neccesary, such as if LAM is negative to multiple LAM2 * -1 for example
Another advantage of this approach is that you are no longer working with multivariate normals, but a bunch of normal distributions, which can lead to faster sampling
Thanks for your kind explanation, @Mauricio_Garnier-Villarre!
So, if my understanding is correct, is it right that you arbitrarily defined the phantom factor with equated factor loadings between x1 and x5, and square the value to have a residual covariance between x1 and x5 (or, is it the way that the phantom residual covariance works)?
Its arbitrary in the sense that a covariance and the phantom factor methods estimate the same relation, but with different parameterization. So, getting the square of the equated factor loading is the quickest way to translate the factor loading to the covariance metric. You could not transform it and the interpretation of the reation would in function of the phantom factor instead of a covariance form.
This comes to be same due to the nature of SEM and tracing rules of parameters
Thanks again for your illustration, @Mauricio_Garnier-Villarre! It’s interesting to hear that ‘magic’. I think I can try programming in the near future given your clarification. Most importantly, I cannot thank you enough for your time spent on my questions. I appreciate it!!!
Hi @Mauricio_Garnier-Villarre,
Please let me have a question about prior predictive checks in BSEM. Let’s imagine that, with the model syntax specified in this thread, I intend to do prior predictive checks. To do that, I know I have to generate datasets based on the priors. However, I feel quite tricky and complicated to do that with all the priors specified in the model above. Therefore, I draw samples from the priors in the following way (a very simple manner to get an idea):
data{
int N; // sample size
int P; // number of variables
}
parameters{
real mu; // mean
real<lower=0> sigma; // standard deviation
}
model{
// priors
mu ~ normal(0, 10);
sigma ~ gamma(1, 0.5);
}
generated quantities {
matrix[N, P] y_sim;
// prior predictive distributions for N samples
for(i in 1:N){
for(j in 1:P){
y_sim[i, j] = normal_rng(mu,sigma);
}
}
}
I can see that the code works with the R code below:
dat<-list(N=3000, P=9)
pri_samples<-stan(model_code=prpc_model,
data=dat,
chains = 2,
iter = 1000)
However, I know this might not be an exactly correct way to do prior predictive checks based on the complex model syntax. I am not sure how to consider priors such as factor loadings, intercepts, and others in sampling from the priors. So, may I get some pieces of advice on how to do that?
Doing this is more complicated than that.
Need to include the complete CFA model in the generated quantities{}. So that instead of having stochastic parameters with respective priors, need to defined each parameter in the generated quantities, but instead of defining as a parameter defined them as random numbers
For example.
for (i in 1:(P-D)) {lam_rng[i] <- normal_rng(0, 10);}
And then buil the equations to predict the variable of interest. Can be cumbersome and long code
It really would be a wonderful trick if stan one day learnt to sample data when missings were encountered… The schematic in my mind is easy to implement 😆