Predictive likelihood in Factor analysis

Hello,

I am new to Stan and trying to figure out how I can infer latent variable models by using it. Now, I am dealing with a simple factor analysis model. My main goal is to fit the model with multivariate numerical train data and then compute the predictive likelihood on unseen test data. I can train the model without any issue. I also added generted quanitites block by following the chapter Gaussian Processes (https://mc-stan.org/docs/2_21/stan-users-guide/gaussian-processes-chapter.html). However, in Fa models (actually in all latent variable models including local variables), one needs to infer a latent variable for each test instance given the posterior or point estimates of global latent variables. I could not find an example of this type of models. I added my code below, which seems not working for now, i.e, gives very low likeliohood. I would appreciate if someone can review it and give some advice. Thanks in advance.

FA_code =

data {
	int<lower=0> N1; // Number of train samples
    int<lower=0> N2; // Number of test samples
	int<lower=0> D; // The original data dimension
	int<lower=0> K; // The latent dimension
	matrix[N1, D] X1; // The data matrix train
    matrix[N2, D] X2; // The data matrix test
}
transformed data {
    int<lower=1> N = N1 + N2;
    matrix[N, D] X;
    for (n1 in 1:N1) X[n1] = X1[n1];
    for (n2 in 1:N2) X[N1 + n2] = X2[n2];
}
parameters {
	matrix[N, K] Z; // The latent matrix
	matrix[D, K] W; // The weight matrix
	real<lower=0> tau; // Noise term 
	vector<lower=0>[K] alpha; // ARD prior
}
transformed parameters{
	vector<lower=0>[K] t_alpha;
	real<lower=0> t_tau;
    t_alpha = inv(sqrt(alpha));
    t_tau = inv(sqrt(tau));
}
model {
	tau ~ gamma(1,1);			
	to_vector(Z) ~ normal(0,1);
	alpha ~ gamma(1e-3,1e-3);				
	for(k in 1:K) W[,k] ~ normal(0, t_alpha[k]);
	to_vector(X[1:N1,]) ~ normal(to_vector(Z[1:N1,]*W'), t_tau);
} 
generated quantities {
    vector[N2] log_lik_test;
    for (n2 in 1:N2)
        log_lik_test[n2] = normal_lpdf(X2[n2]| to_vector(Z[N1 + n2,]*W'), t_tau);
}
1 Like

Welcome to the Stan community

As couple of comments/questions about your model, if Z have K columns, specifying it as normal() would mean that the K factors are uncorrelated, which is usually is hard to argue theoretically.

If you are working with CFA, or another form of constraint model, I wrote I full example in this thread
https://discourse.mc-stan.org/t/non-convergence-of-latent-variable-model/12450/13?u=mauricio_garnier-villarre

Working with factor models, there is an issue of which is the correct log-likelihood. Because the estimation method that includes the data augmentation for the latent factors, ends up in using the conditional log-likelihood. While the desirable one is the marginal log-likelihood, which excludes the estimated factor scores

Merkle, E C, D Furr, and S Rabe-Hesketh. “Bayesian Model Assessment: Use of Conditional vs Marginal Likelihoods,” n.d., 25.

The example I posted, run with data augmentation for the latent factor scores that are correlated. But the log likelihood is the marginal. You could use the parameters from the train data, and estimate the log-likelihood for the test data.

Is this the type of model you are looking at?

2 Likes

Hi Mehmet,

A general modelling point as well, based off this line:

    t_alpha = inv(sqrt(alpha));
    t_tau = inv(sqrt(tau));

it looks like you’re parameterising the normal distribution using the mean and precision. Stan uses the mean and SD for the normal distribution, so you can just use alpha and tau directly

1 Like

Hello Mauricio,

Thanks for your interest and reply. To clarify, my end goal is going to be building multilevel models with different observation data types and compare them in terms of marginal likelihood given the test data. The common point with FA is that, for each observation, some latent variables will be assigned, and there will be some global model parameters. I posted this simple Fa model since it has latent variables for each instance (factor scores), and global parameters (noise covariance and loading matrix) that can be helpful to clarify my roadmap if I can get a solution for this model. Regarding my end goal, I want to:

  • Get posterior draws of latent variables and model parameters given the training data
  • Get posterior draws of latent variables given the test data
  • Compute marginal likelihood of test data by integrating out the latent variables and parameters (not analytically because my models won’t have closed expressions) to see how the model generalizes to unseen data. I checked the paper you referred and saw that marginal WAIC or PSIS Loo can be good candidates for this purpose. I also checked the model you referred to. Although I could not fully understand your CFA model, the computations for the marginal likelihood seemed to be specific to the model. Is there a way to simulate both posterior draws of latent variables given the test data and compute likelihood with these draws?

Hello Andrew,

Thanks for the hint. I will definitely utilize it.

Mehmet

So, I dont understand your factor analysis example. It doesnt seem to have a factorial structure. That is what has me confuse about what are the latent variables.
In Factor Analysis, the factors are underlying theoretically relevant/meaningful constructs that explained how the participants answer to multiple questions.
In your example, it seems that your weight matrix W, its just a noise matrix, instead of a structure loading matrix. Also, it seems wrong to me that the factors Z are independent

I think this can be done, similar to the example I posted. With a structure specifyng which items load into specific factors

If I understand this, would be factor scores from the test data, based on the estimated parameters from the training data, right? I think this can be done in the generated quantities block, by using the estimated parameters from the training data into the equations to form the factor scores

The marginal example uses the SEM equations for marginal likelihood, so it is not specific for my model example, it is specific for a CFA model. So, if you have a CFA model, with a factorial structure, the same matrices can be formed to use that equation.

I dont see why this couldnt be done. Any of these can be calculated in the generated quantities block. But if you use the factor scores draws, you will be estimating the conditional likelihood, instead of the marginal.

If you structure your FA as a CFA, you could use the equations from my example, and estimate the marginal likelihood in function of the test data