Ordering-invariant latent factor model

Hi there,

I am trying to estimate a latent factor model. I looked at the examples here by RIck Farouni and here by Joseph Sakaya (which is based on the paper by Bishop 1999).
The problem is that in both examples I often get different factor loadings when reordering the variables in the data matrix. I looked around and found some work addressing this issue in the Bayesian context (e.g. here by Chan et al), but I don’t think I would understand the theoretical work well enough to implement it myself in Stan.

Does anyone know how to do latent factor models in Stan such that the factor loadings are invariant to the order of the variables in the data matrix?
Basically what I want is a latent factor model where the associations between the variables X and the factors F remain the same after changing the order of the variables. I am not worried about permutation of the signs (i.e. X1(+),X2(-) → F vs X1(+),X2(-) → F) or the ordering of the factors (i.e. X1,X2 → F1 and X3,X4 → F2 vs X1,X2 → F2 and X2,X3 → F1) being affected by the ordering of the variables.

Thank you very much for your help, even if it means that this is not yet possible or implemented in Stan.

Cheers,
Nic

Can you post your code? That’ll help us see what you’re dealing with.

Why would the ordering of the variables in your input data vary?

The code is basically from the second link:

data {
	int<lower=0> J; // Number of samples
	int<lower=0> K; // The original dimension
	int<lower=0> D; // The latent dimension
	matrix[J, K] Y; // The data matrix
}

parameters {
	matrix[J, D] Z; // The latent matrix
	real<lower=0> tau;
	cholesky_factor_cov[K, D] W; // The weight matrix
	vector<lower=0>[D] alpha;
}

transformed parameters {
  vector<lower=0>[D] t_alpha = inv_sqrt(alpha);
}

model {
  tau ~ student_t(4., 0., 1.);
	to_vector(Z) ~ normal(0., 1.);
	alpha ~ gamma(1e-3, 1e-3);
	for (d in 1:D) W[ ,d] ~ normal(0., t_alpha[d]);
	to_vector(Y) ~ normal(to_vector(Z*W'), tau);
}

Why would the ordering of the variables in your input data vary?

What I mean is that the loadings should not change when reordering variables in the input data matrix. Say if the matrix is cbind(Y1, Y2, Y3) or cbind(Y2, Y1, Y3). The initial order is arbitrary from data collection…

You don’t have sufficient control over the data collection process as to have labels for the input data matrix columns that you can use to re-order to ensure a consistent ordering for input to stan?

Ah no, sorry, what I meant was that the ordering is arbitrary by chronological order in which I collected the data. My point though is that it should not affect the loadings whether variable age is the second column or third column in my data set. But that is what is happening. When estimating the same model to the same data but just with the columns in different order, then I can get different results for the factors and their loadings.

One easy way to think of why this could happen is by construction of the model. W is a lower triangular matrix. Thus, with two dimensions, the variable in the first column of Y will have zero weight (loading) on the second dimension. With three dimensions, the variables in the first two columns of Y will have zero weight (loading) on the third dimension. And so on…

I guess I still don’t understand. Say the columns of the data have true labels X/Y/Z, and you have a model with latent factors with loadings F_{X,Y}, F_{X,Z}, F_{Y,Z}, so you structure your model such that it expects the data input to have the columns in order X/Y/Z and have loadings F_{1,2}, F_{1,3}, F_{2,3} that you then know are semantically mapped to F_{X,Y}, F_{X,Z}, F_{Y,Z}, respectively.

If you cannot control the data collection such that you receive data without column labels and in a random column ordering, I don’t think it’s mathematically possible to derive a semantic mapping of modelled loadings to F_{X,Y}, F_{X,Z}, F_{Y,Z}. That semantic mapping can only be achieved by knowing the true column labels and ordering them consistently prior to input to Stan.

I’ll make an example. Say I have J=10 observations for K=5 variables in J \times K matrix Y = (y_1, y_2, y_3, y_4, y_5). Estimating the latent factor model with D=2 dimensions, I may get W_{\cdot,1} = (0.1, -0.2, 0.1, 0, 0) and W_{\cdot,2} = (0, 0.4, -0.2, 0, 0, 0.3). (I ignore here that W must be such that C=WW^T is a covariance matrix (Cholesky factorization.)
Now, I reorder Y = (y_3, y_2, y_5, y_1, y_4) arbitrarily. Estimating the model again, and for ease of comparison keeping the order of y in W as above, I may get W_{\cdot,1} = (0.1, 0, 0.1, 0, 0) and W_{\cdot,2} = (0, 0.4, 0, 0, 0, 0.3). In other words, both W_{\cdot,1} and W_{\cdot,2} have changed simply by reordering the columns of Y. E.g., in the first model, y_2 was loading on the first latent factor, not so anymore in the second model. Also, observe that W_{y_3,2} = 0 in the second model because y_3 is the first column there (see explanation in the reply before). That is, the latent factor model is not invariant to the ordering of the input data Y.

The order of the columns in my input data is arbitrary and it shouldn’t matter for the results whether age is the first or the second column.

I assume your problem is entirely about rotation and not prediction. The easiest idea that comes to my mind is to apply some standard rotation (e.g. varimax) to the draws after-the-fact. My guess is that you wouldn’t be able to do this within Stan on a per-draw basis, but I could be wrong.

Edit 2: I realize now that your use of cholesky_factor_cov was to (much more easily) achieve what I describe below.

An alternative (edit: at least for a 2-component solution) would be to constrain one cell in W to 0 so that it lines up with the same variable across re-orderings, which is a very specific rotation. That should keep the magnitude of the loadings identical. I don’t understand enough about Cholesky matrices to know whether this would work with that data structure, but here is what it would look like with a standard matrix.

data{
  int i; // which cell should be constrained to 0
  int j[K-1]; // which cells should be estimated
}
parameters{
  vector[K-1] W1_raw;
  matrix[K, D-1] W_raw;
}
transformed parameters{
  matrix[K, D] W;
  W[i,1] = 0;
  W[j,1] = W1_raw;
  W[,2:D] = W_raw;
}

Maybe you could achieve the same thing with a very narrow prior on that one cell, thereby avoiding all of this reconstruction. But I’m not sure.

I’m pretty sure your issue arises from cholesky_factor_cov. Because it constrains specific elements to 0, rearranging the columns of Y will give you a different result for W by design.

Yes, the different W by design is exactly the problem and I am wondering how to address this in Stan. Or maybe I would need to average over all possible configurations of W ex post, but that doesn’t seem so elegant…

Gotcha, I understand what you’re going for now. Sorry for repeating back what you already knew.

My sense is that you’re looking for a complex modeling solution for a very simple programming solution (e.g. consistently ordering variables). While it’s probably possible, I’m not sure how you would do it. And I’d worry that it would be more sensitive than the approach you’re already taking.

Is there a reason why this matters for your specific problem?

1 Like

What do you mean by consistently ordering variables?

The other way to look at it is that you have a different estimate for each ordering of the variables. Which one is correct? I can easily keep the ordering consistent, but it seems wrong when a different ordering would give a completely different interpretation of the latent factors. I glanced over work suggesting to average over the loadings from each ordering variant, yet this seems computationally expensive.

The reason it matters to me is that I care about the interpretation of the latent factors, i.e. their composition of the observed variables based on the loadings. However, I might be missing something very obvious as both you and @mike-lawrence seem to suggest something similar.

Correct me if I’m misunderstanding your question. I believe your concern is just bumping into a fundamental, unavoidable feature of PCA/factor analysis. An infinite number of solutions will produce the same fit, none of them more “true” than any others. This issue/feature is inherent in any scenario where we want to give meaning to such a model. You can always rotate the loadings after estimation with standard methods (e.g. varimax, promax), which might be closer to what you’re looking for.

I’ve never heard of this before. I wonder how the resulting structure compares to more common rotations.

1 Like

Thanks @simonbrauer. I would then estimate the latent factor model as described with cholesky_factor_cov and rotate the loadings after estimation as you suggested.

It still bothers me though that rearranging Y will give me different W by design, so I assume the rotated loadings from varimx after estimation will also be different. There is probably no way around this?

Sometimes it does, sometimes it doesn’t. While playing around with a test dataset for this question, Varimax (and Promax, and Quartimax) produced the same loadings when only using two components but did not when using more than two. I don’t know if that is generalizable behavior to other situations or specific to my test data. A better understanding of the exact algorithms might be helpful here.

If it is any solace, this is a big theoretical challenge that pops up in all treatments of substantively-motivated PCA/FA. From Kline’s An Easy Guide to Factor Analysis (1994:56,61)

In principal components, for example, it an is artefact of the method that a general factor is produced followed by a series of bipolar factors. Thus interpretation of these components as reflecting anything but their algebra is dubious… rotated factors may take up any position in factor space and that accordingly, as has been argued, there is a virtual infinity of solutions. Since, as has been seen, these are mathematically equivalent, there is no mathematical reason for choosing one rather than another, which is precisely why the results of the first condensation, by whatever method, should not be interpreted as the final solution.

1 Like