Multi-group Confirmatory Factor Analysis: how to handle the Cholesky factor correlations?

mansillo · January 18, 2020, 5:53am

I started with some code from Mike Laurence and Michael de Witt that I modified to include an endogenous latent variable. I want to now make this multi-group by time so I can make the parameter factor loadings evolve over time. I am unsure how I should best incorporate multiple groups in this model. After that I think I should be able to add a dynamic component with a state-space model.

I got to the cholesky_factor_corr parameter declaration and got stumped at how to proceed vectorizing it for time/group. Any suggestions on how to perform this analysis by groups would be highly appreciated.

//confirmatory factor analysis code
data{
    // n_subj: number of subjects
    int n_subj ;
    // n_y: number of outcomes
    int n_y ;
    // y: matrix of outcomes
    matrix[n_subj,n_y] y ;
    // n_fac: number of latent factors level one
    int n_fac_one ;
    // y_fac: list of which factor is associated with each outcome
    int<lower=1,upper=n_fac_one> y_fac_one[n_y] ;
    // n_fac: number of latent factors level two
    int n_fac_two ;
    // y_fac: list of which factor is associated with each outcome
    int<lower=1,upper=n_fac_two> y_fac_two[n_fac_one] ;
}
transformed data{
    // scaled_y: z-scaled outcomes
    matrix[n_subj,n_y] scaled_y ;
    for(i in 1:n_y){
        scaled_y[,i] = (y[,i]-mean(y[,i])) / sd(y[,i]) ;
    }
    
}
parameters{
    // normal01: a helper variable for more efficient non-centered multivariate
    //  sampling of subj_facs
    matrix[n_fac_one, n_subj] normal01;
    matrix[n_fac_two, n_subj] normal02;
    // fac_cor_helper: correlations (on cholesky factor scale) amongst latent
    //  factors
    cholesky_factor_corr[n_fac_one] fac_cor_helper_one ;
    cholesky_factor_corr[n_fac_two] fac_cor_helper_two ;
    // scaled_y_means: means for each outcome
    vector[n_y] scaled_y_means ;
    // scaled_y_noise: measurement noise for each outcome
    vector<lower=0>[n_y] scaled_y_noise_one ;
    vector<lower=0>[n_y] scaled_y_noise_two ;
    // betas: how each factor loads on to each outcome.
    //  Must be postive for identifiability.
    vector<lower=0>[n_y] betas_one ;
    vector<lower=0>[n_y] betas_two ;
}
transformed parameters{
    // subj_facs: matrix of values for each factor associated with each subject
    matrix[n_subj,n_fac_one] subj_facs_one ;
    matrix[n_subj,n_fac_two] subj_facs_two ;
    // the following calculation implies that rows of subj_facs are sampled from
    //  a multivariate normal distribution with mean of 0 and sd of 1 in all
    //  dimensions and a correlation matrix of fac_cor.
    subj_facs_one = transpose(fac_cor_helper_one * normal01) ;
    subj_facs_two = transpose(fac_cor_helper_two * normal02) ;
}
model{
    // normal01 must have normal(0,1) prior for "non-centered" parameterization
    //  on the multivariate distribution of latent factors
    to_vector(normal01) ~ normal(0,1) ;
    to_vector(normal02) ~ normal(0,1) ;
    // flat prior on correlations
    fac_cor_helper_one ~ lkj_corr_cholesky(1) ;
    fac_cor_helper_two ~ lkj_corr_cholesky(1) ;
    // normal prior on y means
    scaled_y_means ~ normal(0,1) ;
    // weibull prior on measurement noise
    scaled_y_noise_one ~ weibull(2,1) ;
    scaled_y_noise_two ~ weibull(2,1) ;
    // normal(0,1) priors on betas
    betas_one ~ normal(0,1) ;
    betas_two ~ normal(0,1) ;
    // each outcome as normal, influenced by its associated latent factor
    for(i in 1:n_y){
        scaled_y[,i] ~ normal(
            scaled_y_means[i] + subj_facs_one[,y_fac_one[i]] * betas_one[i]
            , scaled_y_noise_one[i]
        ) ;
    }
    
    // each outcome as normal, influenced by its associated latent factor level two
    for(i in 1:n_fac_one){
        subj_facs_one[,i] ~ normal(
             subj_facs_two[,y_fac_two[i]] * betas_two[i]
            , scaled_y_noise_two[i]
        ) ;
    }
}
generated quantities{
    //cors: correlations amongst within-subject predictors' effects
    corr_matrix[n_fac_one] fac_cor_one ;
    corr_matrix[n_fac_two] fac_cor_two ;
    // y_means: outcome means on the original scale of Y
    vector[n_y] y_means ;
    // y_noise: outcome noise on the original scale of Y
    vector[n_y] y_noise ;
    fac_cor_one = multiply_lower_tri_self_transpose(fac_cor_helper_one) ;
    fac_cor_two = multiply_lower_tri_self_transpose(fac_cor_helper_two) ;
    for(i in 1:n_y){
        y_means[i] = scaled_y_means[i]*sd(y[,i]) + mean(y[,i]) ;
        y_noise[i] = scaled_y_noise_one[i]*sd(y[,i]) ;
    }
}

instead of

model{
vector[N] mu = alpha+beta*x;
y~normal(mu,sigma);
}

To include mathematical notation in your post put LaTeX syntax between two $ symbols, e.g.,
p(\theta | y) \propto p(\theta) p(y | \theta).

andrjohns · January 21, 2020, 12:54am

Hi mansillo, if this is a repeated measures factor analysis where you want to allow for correlations between the latent factors over time, as well as residual correlations between the same item measured over time, I’ve got a stan model that I’ve used for that. You’ll need to adapt it for your use-case, but hopefully it shows a way to do what you’re after.

It’s for a 1-factor model measured T times, like the diagram below (except with latent correlations, rather than regressions):
mqnYr

Code:

data{
  int N;              //Number of individuals
  int J;              //Number of items
  int T;              //Number of timepoints
  vector[J*T] y[N];
}

transformed data{
  /* Cache some calculations and structures so they don't get
       re-evaluated at each iteration */
  int J_T = J*T;
  int J_Tm1 = J*(T-1);
  matrix[J_T,J_T] corr_diag = diag_matrix(rep_vector(1,J_T));
  vector[J_Tm1] zero_load = rep_vector(0, J_Tm1);
}

parameters{
  /* 'Raw' Variables for Non-Centered Parameterisation */
  matrix<lower=0.0>[J,T] load_r;
  matrix[J,T] intercepts_r;
  matrix[N,T] eta_r;

  /* Hyperparameters for Loadings */
  vector<lower=0>[J] load_m;
  vector<lower=0>[J] load_sd;
  vector[J] intercepts_m;
  vector<lower=0>[J] intercepts_sd;

  /* Correlations between factors over time,
       item residual variances, and residual correlations */
  cholesky_factor_corr[T] e_Lcorr;
  vector<lower=0>[J_T] resvar;
  cholesky_factor_corr[2] y_Lcorr[J_Tm1];
}

transformed parameters{
  matrix[T,J] load;
  matrix[T,J] intercepts;
  cholesky_factor_corr[J_T] y_corr = corr_diag;

  /* Transform 'raw' loadings to required distribution */
  for(j in 1:J)
    load[,j] = (load_m[j] + load_r[j] * load_sd[j])';
  for(j in 1:J)
    intercepts[,j] = (intercepts_m[j] + intercepts_r[j] * intercepts_sd[j])';

  /* Pack residual correlations into single matrix */
  for(c in 1:J_Tm1) {
    y_corr[c+J,c] = y_Lcorr[c][2,1];
    y_corr[c+J,c+J] = y_Lcorr[c][2,2];
  }
}

model{
  /* These are temporary variables, used for holding
       functions of other parameters. More efficient on
       memory to include here as their posteriors aren't saved */
  matrix[J_T, T] lam;
  matrix[N,T] eta_t;
  matrix[N,J_T] lam_eta;
  row_vector[J_T] eta_vec[N];
  row_vector[J_T] inter;

  /* Priors for factor and residual correlations */
  e_Lcorr ~ lkj_corr_cholesky(1);
  for(c in 1:J_Tm1)
    y_Lcorr[c] ~ lkj_corr_cholesky(1);

  /* Priors for intercept and loading hyperparameters */
  load_m ~ normal(0,5);
  intercepts_m ~ normal(0,5);
  load_sd ~ cauchy(0,5);
  intercepts_sd ~ cauchy(0,5);

  /* Priors for 'raw' parameters for non-centered parameterisation */
  to_vector(load_r) ~ std_normal();
  to_vector(intercepts_r) ~ std_normal();
  to_vector(eta_r) ~ std_normal();


  /* Pack loadings into a factor loading matrix with zeros
     for cross-loadings */
  lam[,1] = append_row(load[1]', zero_load);
  lam[,T] = append_row(zero_load, load[T]');

  for(t in 2:(T-1))
    lam[,t] = append_row(append_row(rep_vector(0, J*(t-1)), load[t]'),
                         rep_vector(0, J*(T-t)));

  /* Priors for residual variances */
  resvar ~ cauchy(0,5);
  
  /* Transform 'raw' factor scores with multivariate
       non-centered parameterisation */
  for(n in 1:N)
    eta_t[n] = (e_Lcorr*eta_r[n]')';

  /* Create the E[Y] for each individual */
  lam_eta = eta_t*lam';
  inter = to_row_vector(intercepts);
  for(n in 1:N)
    eta_vec[n] = lam_eta[n] + inter;

  /* Link to observed data, with a multivariate normal distribution
       for each pair of correlated observations */
  y ~ multi_normal_cholesky(eta_vec, diag_pre_multiply(resvar,y_corr));
}

generated quantities {
  /* Transform factor correlations from Cholesky to Correlation matrix */
  corr_matrix[T] corr = multiply_lower_tri_self_transpose(e_Lcorr);
  vector[J_Tm1] ycorrs;

  /* Put variance parameters into variance scale (from SD) */
  vector[J_T] vars = square(resvar);
  vector[J] loadvars = square(load_sd);

  /* More memory efficient to store only auto-correlations in vector
      - since the non-autocorrelations are zero */
  for(c in 1:J_Tm1)
    ycorrs[c] = multiply_lower_tri_self_transpose(y_Lcorr[c])[2,1];
}

And some toy data here: data100.R (25.9 KB)

Hope that all helps!

mansillo · February 7, 2020, 4:27am

Thanks @andrjohns you certainly helped me out but this presents me with another problem which is a general confirmatory analysis question for how to parametrise second and higher-order factors. I think this revision to the code I initially posted handles multiple groups and also links each of the groups with an index for time:

//confirmatory factor analysis code
data{
// n_t_obs: number of time points observed
int<lower = 0> n_t_obs ;
// n_t_miss: number of time points missing
int<lower = 0> n_t_miss ;
// T: number of time points
int<lower = 1, upper = n_t_obs + n_t_miss> T ;
// ii_t_obs: index of time points observed
int<lower = 1> ii_t_obs[n_t_obs] ;
// ii_t_miss: index of time points missing
int<lower = 1> ii_t_miss[n_t_miss] ;
// n_subj: number of subjects
int n_subj ;
// n_y: number of outcomes
int n_y ;
// y: matrix of outcomes
matrix[n_subj,n_y] y ;
// y_t: index of time points for observed outcomes
int y_t[n_y] ;
// n_fac: number of latent factors level one
int n_fac_one ;
// y_fac: list of which factor is associated with each outcome
int<lower=1,upper=n_fac_one> y_fac_one[n_y] ;
// n_fac: number of latent factors level two
int n_fac_two ;
// y_fac: list of which factor is associated with each outcome
int<lower=1,upper=n_fac_two> y_fac_two[n_fac_one] ;
}
transformed data{
// scaled_y: z-scaled outcomes
matrix[n_subj,n_y] scaled_y ;
for(i in 1:n_y){
scaled_y[,i] = (y[,i]-mean(y[,i])) / sd(y[,i]) ;
}
}
parameters{
// normal01: a helper variable for more efficient non-centered multivariate
// sampling of subj_facs
matrix[n_fac_one, n_subj] normal01[n_t_obs];
matrix[n_fac_two, n_subj] normal02[n_t_obs];
// fac_cor_helper: correlations (on cholesky factor scale) amongst latent
// factors
cholesky_factor_corr[n_fac_one] fac_cor_helper_one[n_t_obs] ;
cholesky_factor_corr[n_fac_two] fac_cor_helper_two[n_t_obs] ;
// scaled_y_means: means for each outcome
matrix[n_y,n_t_obs] scaled_y_means ;
// scaled_y_noise: measurement noise for each outcome
matrix<lower=0>[n_y,n_t_obs] scaled_y_noise_one ;
matrix<lower=0>[n_y,n_t_obs] scaled_y_noise_two ;
// betas: how each factor loads on to each outcome.
// Must be postive for identifiability.
matrix<lower=0>[n_y,n_t_obs] betas_one ;
matrix<lower=0>[n_y,n_t_obs] betas_two ;
}
transformed parameters{
// subj_facs: matrix of values for each factor associated with each subject
matrix[n_subj,n_fac_one] subj_facs_one[n_t_obs] ;
matrix[n_subj,n_fac_two] subj_facs_two[n_t_obs] ;
matrix[n_y,T] beta_one ;
matrix[n_y,T] beta_two ;
// the following calculation implies that rows of subj_facs are sampled from
// a multivariate normal distribution with mean of 0 and sd of 1 in all
// dimensions and a correlation matrix of fac_cor.
for(j in 1:n_t_obs){
subj_facs_one[y_t[j]] = transpose(fac_cor_helper_one[j] * normal01[y_t[j]]) ;
subj_facs_two[y_t[j]] = transpose(fac_cor_helper_two[j] * normal02[y_t[j]]) ;
}
for(i in 1:n_t_obs){
beta_one[,ii_t_obs[i]] = betas_one[,i];
beta_two[,ii_t_obs[i]] = betas_two[,i];
}
}
model{
// normal01 must have normal(0,1) prior for “non-centered” parameterization
// on the multivariate distribution of latent factors
to_vector(normal01[n_t_obs]) ~ normal(0,1) ;
to_vector(normal02[n_t_obs]) ~ normal(0,1) ;
for(j in 1:n_t_obs){
// flat prior on correlations
fac_cor_helper_one[j] ~ lkj_corr_cholesky(1.0) ;
fac_cor_helper_two[j] ~ lkj_corr_cholesky(1.0) ;
}
for(i in 1:n_t_obs){
// normal prior on y means
scaled_y_means[,i] ~ normal(0,1) ;
}
for(i in 1:n_t_obs){
// weibull prior on measurement noise
scaled_y_noise_one[,i] ~ weibull(2,1) ;
scaled_y_noise_two[,i] ~ weibull(2,1) ;
}
// normal(0,1) priors on betas initialisation
betas_one[1] ~ normal(0,1) ;
betas_two[1] ~ normal(0,1) ;
// normal(0,1) priors on betas with a random-walk
for(t in 2:T){
betas_one[t] ~ normal(beta_one[t-1],1) ;
betas_two[t] ~ normal(beta_two[t-1],1) ;
}
// each outcome as normal, influenced by its associated latent factor
for(j in 1:n_t_obs){
for(i in 1:n_y){
scaled_y[y_t[j],i] ~ normal(
scaled_y_means[i,j] + subj_facs_one[y_t[j],y_fac_one[i]] * betas_one[i,j]
, scaled_y_noise_one[i,j]
) ;
}
}
// each outcome as normal, influenced by its associated latent factor level two
for(j in 1:n_t_obs){
for(i in 1:n_fac_one){
subj_facs_one[y_t[j],i] ~ normal(
subj_facs_two[y_t[j],y_fac_two[i]] * betas_two[i,j]
, scaled_y_noise_two[i,j]
) ;
}
}
}
generated quantities{
//cors: correlations amongst within-subject predictors’ effects
corr_matrix[n_fac_one] fac_cor_one[n_t_obs] ;
corr_matrix[n_fac_two] fac_cor_two[n_t_obs] ;
// y_means: outcome means on the original scale of Y
matrix[n_y,n_t_obs] y_means ;
// y_noise: outcome noise on the original scale of Y
matrix[n_y,n_t_obs] y_noise ;
fac_cor_one[n_t_obs] = multiply_lower_tri_self_transpose(fac_cor_helper_one[n_t_obs]) ;
fac_cor_two[n_t_obs] = multiply_lower_tri_self_transpose(fac_cor_helper_two[n_t_obs]) ;
for(j in 1:n_t_obs){
for(i in 1:n_y){
y_means[i,j] = scaled_y_means[i,j]*sd(y[,i]) + mean(y[,i]) ;
y_noise[i,j] = scaled_y_noise_one[i,j]*sd(y[,i]) ;
}
}
}

The confirmatory factor analysis (CFA) model with a single order is a natural extension of the exploratory factor analysis (EFA) model; defined by
x = \Lambda \xi + \epsilon,
where x, \Lambda, \xi and \epsilon are the same as the EFA model. Common factors are allowed to be correlation so \xi is normally distributed as N [0,\Phi] with \Phi a positive definite covariance matrix which assumed that \xi is independent with \epsilon, the covariance matrix of x, giving
\Sigma = \Lambda \Phi \Lambda^{T}+\Psi_{\epsilon}.

So as the CFA model is

x = \Lambda\xi+\epsilon for the first order this should be fine with indexes for columns and time points of observations:

// each outcome as normal, influenced by its associated latent factor
for(j in 1:n_t_obs){
for(i in 1:n_y){
scaled_y[y_t[j],i] ~ normal(
scaled_y_means[i,j] + subj_facs_one[y_t[j],y_fac_one[i]] * betas_one[i,j]
, scaled_y_noise_one[i,j]
) ;
}
}

However, I do not know how to make the second-order of the model:

x = B(\Lambda\xi+\epsilon)+\delta =B\Lambda\xi+B\epsilon+\delta.

Topic		Replies	Views
Cholesky Decomposition with Ordinal Outcome Modeling rstan , techniques , specification	9	554	May 27, 2021
Help with factor analysis (latent variable model) Modeling specification	22	5244	June 11, 2018
Factor Mixture Modelling Modeling	5	1808	May 31, 2017
Non-convergence of latent variable model Modeling	28	6137	February 13, 2020
Indexing Time in a dynamic factor model Modeling	3	787	April 3, 2020

Multi-group Confirmatory Factor Analysis: how to handle the Cholesky factor correlations?

Related topics