I brought this model up in another post, but thought it might be better to discuss it on its own terms.

A standard probabilistic PCA model with no bells or whistles for observations \lbrace\mathbf{y}_i\rbrace_{i=1}^N, K-dimensional latent factor loadings \lbrace\mathbf{z}_i\rbrace_{i=1}^N, and factor matrix \mathbf{A}\in\mathbb{R}^{D\times K}, takes the form

Here, K is an important hyperparameter which controls the complexity of the model. We could generalize the model to include K as a random variable directly:

Note that all \mathbf{y}_i are drawn with respect to the same k, so they are not i.i.d. from a mixture distribution. Rather the mixture is on the dataset level. A problem with this design and mixtures in general (as pointed out by Aki and others) is that the component parameters are usually only informed by the posterior fraction of the data assigned to the component. This design is a bit different in that parameters are shared between the components, which should hopefully alleviate that issue.

An intuitively appealing part of this construction is that it mimics the iterative construction of PCA where one first finds the direction along which the variance is maximized, and then adds in components that complement it. Since for k=1 the first vector has to stand on its own, this should also help with identification of the model (permutation should not be a problem if the likelihood is otherwise well-defined).

Would this design work? Or is there a construction that is more suitable for this sort of design than the mixture (geometric mixture perhaps)? Is the prior on w suitable, or should it penalize complexity? That extra hierarchical layer is mostly for the sake of stan, as it allows us to marginalize over k.