Variable number of latent indicators and number of mediators

Ethan_Harris · May 23, 2024, 1:57pm

First of all thanks to the community for helping me out. So I am running up against an issue that I can’t quite figure out how to address. I am currently writing a program in R and in Stan that looks at mediation with latent variables. I am running in to an issue with keeping it flexible and general. I have figured out how to send a variable number of mediators to Stan, but I am getting stumped by sending a non constant number of indicators per mediator. For example I have the code running if I want to send 4 or 5 or 6 mediators with a constat number of indicators such as all of the mediators have 4 indicators or 10 mediators. But I am not sure on how to handle if the first 2 mediators have 5 indicators and the rest of them have 3 indicators. I have one work around, but I am very dubious of the execution. Check out a snippit of my code:
Rcode

# Looping Indicators
          ni = 5                                            # Sample size indicator
          ji = 2                                            # Mediator count indicator
         fli = 1                                            # Factor Loading indicator
          ci = 1                                            # Class Indicator
           
  # Simulation Conditions
           set.seed(1234)                                   # Set seed for reproducibility
           N = c(50,100,250,500,1000)                       # Sample Sizes Investigated
           J = c(5,7)                                       # Number of mediators investigated
           K = c(5,5,5,5,5,3,3)                             # Number of indicators per mediator
          Km = max(K)                                       # Maximum K value
           
   lambda_fl = c(0.8,0.5)                                   # Factor Loading Strengths Investigated
           L = lapply(K, function(k) rep(lambda_fl[fli], k))
          L1 = rep(c(lambda_fl[fli]), times = J[ji])                        # First Factor loadings
      beta_a = c(.59,.36,.11,0,0,0.36,0)                    # Mediator regression strengths from X to Mi.
      beta_b = c(.59,.36,.11,0,0,0,0.36)                    # Mediator regression strengths from Mi to Y.
      beta_c = 0                                            # Direct X to Y regression strength

......

  stan_data <- list(N = N[ni], J = J[ji], K = K, Km = Km, L1 = L1,
                    I = I_list,
                    X = med_data$X, 
                    Y = med_data$Y)

Stan Code

data {
  int<lower=1>                    N;            // Number of observations
  int<lower=1>                    J;            // Number of mediators
  array[J] int<lower=1>           K;            // Number of indicators per mediator
  int<lower=1>                   Km;            // Largest number of indicators on a single mediator
  vector[J]                      L1;            // First Factor Loadings                                                                                     L1;            // Factor Loadings
  array[J] matrix[N, Km]          I;            // Indicator data for mediators
  vector[N]                       X;            // Independent variable
  vector[N]                       Y;            // Dependent variable
}

parameters {
  vector[J]                  beta_a;            // Regressions of X on M[J]
  vector[J]                  beta_b;            // Regressions of M[J] on Y
  real                       beta_c;            // Regression of X on Y
  
  matrix[J,Km]             lambda_I;            // Indicator loadings of M[J]
  
  matrix<lower=1e-20>[J,Km] sigma_I;            // Std dev of indicators
  vector<lower=1e-20>[J]    sigma_M;            // Std dev of mediators
  real<lower=1e-20>         sigma_Y;            // Std dev of Y
  
  matrix[N,J]                     M;            // Latent scores for Mediators
}

model {
  // Priors
  
  for(j in 1:J){
    for (k in 1:K[j]) {
       sigma_I[j, k] ~ cauchy(0, 2.5);       // Indicator standard deviations
      lambda_I[j, k] ~ normal(0, 10);       // Indicator factor loadings
    }
           beta_a[j] ~ normal(0, 10);        // Regressions from X to M[j]
           beta_b[j] ~ normal(0, 10);        // Regressions from M[j] to Y
          sigma_M[j] ~ cauchy(0, 2.5);       // Standard deviations for M[j]
  }
              beta_c ~ normal(0, 10);        // Regression from X to Y
             sigma_Y ~ cauchy(0, 2.5);       // Standard deviation for Y

The problem comes with the declaration of the data I, and parameters lambda_I and sigma_I. So I currently have this being declared as matrices where the dimensions are set to the largest number of indicators per mediator. The problem that I have with this is then there are like ghost columns for the smaller number of indicators and I think that that will have issues with respect to Stan utilizing priors on those values and affect the calculations. I feel like I have all of the info that I need to be able to have an array of matrixes of variable size for my I data, but I cannot figure out how to do it. I have the data simulated correctly in R, and my code runs smooth if the number of indicators (variable K) is constant per mediator. Let me know if you would like to see some more of the code.

Topic		Replies	Views
Latent Mediation Modeling rstan , fitting-issues	2	225	February 4, 2024
Sending 3d array from R to Stan Modeling	2	174	February 12, 2024
Rstan convergence issue - mediation model with latent variables Modeling rstan , fitting-issues , specification	0	438	April 10, 2022
No convergence - mediation with latent variables in rstan Modeling rstan , fitting-issues	2	433	April 11, 2022
Generalized linear latent variable model in Stan Modeling	3	3792	October 9, 2017

Variable number of latent indicators and number of mediators

Related topics