Latent factor models are popular in research. A single factor model with 5 indicators is shown on the OpenMx front page. Due to rotational indeterminacy, it is necessary to constrain the sign of one of the factor loadings. For background, see this article. Typically, the first loading is constrained positive.
Edgar Merkle (blavaan) and I have put together code to try to fit this kind of model using Stan 2.16. The problem we’re running into is that, sometimes, some of the chains get stuck with the first loading close to zero and the remaining loadings with the opposite of the expected sign. I’ve even tried to specify starting values for the first loading, but it doesn’t help consistently. Either the loading starts near 0 or 1 and stays there during sampling. When this happens, of course the Rhats are terrible in the 20s and 30s.
I’ve tried to make the code as simple as possible to make it easy to understand. Looking forward to your suggestions.
If we have prior information that all loadings are positive then we can constrain that and it is fine for me.
I had encountered the same problem unless the positive value for the first loading is far from 0.
If the signs for the loadings can be both positive and negative, what I have done so far is that running model without constraining loadings. At the end, for the summary, just take a chains that has the same signs.
If you see in the plot I attach here, then the chains are symmetric about 0. For final estimate, I take chains has the same signs as 1, 3, and 4 in the plot, or if you do not want to loose chain 2 then you could change the sign of that chain in R before getting the final estimates.
Yes, of course we can guess the sign of the loadings after running some chains. When I write up my paper, are reviewers going to accept that? I’d much prefer it if Stan could figure out the signs based on the data.
so when we switch the signs it does not change the product. I think it is reasonable that Stan recognizes the maximum from the posterior distribution ans it samples from those maximum.
When I set lambda1>0 I mean to take a half of the parameter space, however, in case that some loadings close to 0, the output does not give estimates as expected. That is why I have to follow what I have explained above.
I would like to hear more about this from others’ experience!
Yes, it is maybe possible to guess the signs of the loadings for a single factor. However, if there are 2 or more factors then I think it would be very difficult to guess the signs of the loadings.
For a one factor model, I would usually declare the loadings to be unconstrained in the parameters block and in the generated quantities block, multiply stuff by -1 as necessary to force the sign of one loading to be positive.
For a multi-factor model with an identity matrix for the correlations among the factors, a cholesky_factor_cov can work well as a loadings matrix since it is lower trapezoidal with positive diagonal entries. But that can run into problems like the one you encountered, if for example, the first manifest variable has little signal.
For a multi-factor model with correlated factors, it is harder. One option would be to use a positive_ordered type for the factor variances and restrict some loading on each factor to be 1. But that too can run into problems if two of the factor variances are close together.
Sorry to dredge up an old thread, but arrived here from the Arxiv paper describing the Stan update for blavaan where @bgoodri 's suggestion is mentioned. I’m a little confused how the generated-quantities-transform helps things in the single-latent-trait model relative to having one loading be explicitly positive constrained. Say there’s two loadings and the data are very consistent with values of 1 for both; with the generated-quantities-transform approach, the sampler is still only “seeing” the bimodal version of the parameters, so won’t that risk exploration difficulties? With an explicit positive constraint on one, the sampler will then only see a single mode.
Thanks, but I’m more trying to understand the suggestion by @bgoodri, which is what blavaan is using internally so presumably it’s an idea that has some consensus as making sense and I’m just not following it.
What they are saying is that having no constrained parameters is better because your sampling efficiency improves. I agree, though it’s not usually that big of a performance difference as you’re only constraining a single parameter.
Yeah, I get that that’s the claim, I just don’t understand precisely how; indeed, given the unconstrained approach leads to a multimodal posterior in the eyes of the sampler, I would have expected less efficient exploration. But maybe I’m under-appreciating the efficiency of HMC and it has no issues with multimodality?
HMC isn’t aware of other modes if they are too far away from each other. So it will select a mode randomly based on initialization values and whatever random trajectories take it to the closest mode.
Yeah, that’s my intuition as well, hence my confusion as to why they explicitly assert that imposing the positive constraint in GQ is better than doing so in the model proper.
There’s a minor sampling improvement without a constraint. Personally I think the need to do a transformation in the GQ actually makes things more complicated, especially if you have hierarchical parameters.
Personally, I just find it easier, and better, to ensure all indicators are in the same direction, then just impose positivity constraints for all non-crossloadings. This can get tricky for multilevel models (because then you either need a multiplicative random effect, or to do some fixed effect constraint to ensure all group-specific loadings remain positive after adding the RE… ).
The next best would be to post-process via the GQ method (flipping loadings accordingly).
The worst, imo, is to constraint one loading to a sign. What happens, is that there is a target mode where that sign is positive, and others are in the correct direction; but there is also a set of local modes where that loading is effectively zero, and the others are directionally unidentified. So you either get oscillation between (at least) three solutions, or you get some bad between-chain convergence. This is made worse if the fixed-sign loading is not particularly strong, because it can be set to zero without much difference in likelihood.
I’ve tried all three; I still prefer the first approach, though it can be messier with cross-loadings. Second approach is fine, but is irritating to implement especially with multilevel structure. Third is a no-go in my experience - It can and will fail if the fixed loading is too weak, or if there are a lot of indicators.
That seems like a rare case, personally. If I a-priori know the sign of one loading, and I know what direction I want the latent variable in, then I generally know the rest too [except in the case of cross-loadings].