First of all thanks to the community for helping me out. So I am running up against an issue that I can’t quite figure out how to address. I am currently writing a program in R and in Stan that looks at mediation with latent variables. I am running in to an issue with keeping it flexible and general. I have figured out how to send a variable number of mediators to Stan, but I am getting stumped by sending a non constant number of indicators per mediator. For example I have the code running if I want to send 4 or 5 or 6 mediators with a constat number of indicators such as all of the mediators have 4 indicators or 10 mediators. But I am not sure on how to handle if the first 2 mediators have 5 indicators and the rest of them have 3 indicators. I have one work around, but I am very dubious of the execution. Check out a snippit of my code:
Rcode
# Looping Indicators
ni = 5 # Sample size indicator
ji = 2 # Mediator count indicator
fli = 1 # Factor Loading indicator
ci = 1 # Class Indicator
# Simulation Conditions
set.seed(1234) # Set seed for reproducibility
N = c(50,100,250,500,1000) # Sample Sizes Investigated
J = c(5,7) # Number of mediators investigated
K = c(5,5,5,5,5,3,3) # Number of indicators per mediator
Km = max(K) # Maximum K value
lambda_fl = c(0.8,0.5) # Factor Loading Strengths Investigated
L = lapply(K, function(k) rep(lambda_fl[fli], k))
L1 = rep(c(lambda_fl[fli]), times = J[ji]) # First Factor loadings
beta_a = c(.59,.36,.11,0,0,0.36,0) # Mediator regression strengths from X to Mi.
beta_b = c(.59,.36,.11,0,0,0,0.36) # Mediator regression strengths from Mi to Y.
beta_c = 0 # Direct X to Y regression strength
......
stan_data <- list(N = N[ni], J = J[ji], K = K, Km = Km, L1 = L1,
I = I_list,
X = med_data$X,
Y = med_data$Y)
Stan Code
data {
int<lower=1> N; // Number of observations
int<lower=1> J; // Number of mediators
array[J] int<lower=1> K; // Number of indicators per mediator
int<lower=1> Km; // Largest number of indicators on a single mediator
vector[J] L1; // First Factor Loadings L1; // Factor Loadings
array[J] matrix[N, Km] I; // Indicator data for mediators
vector[N] X; // Independent variable
vector[N] Y; // Dependent variable
}
parameters {
vector[J] beta_a; // Regressions of X on M[J]
vector[J] beta_b; // Regressions of M[J] on Y
real beta_c; // Regression of X on Y
matrix[J,Km] lambda_I; // Indicator loadings of M[J]
matrix<lower=1e-20>[J,Km] sigma_I; // Std dev of indicators
vector<lower=1e-20>[J] sigma_M; // Std dev of mediators
real<lower=1e-20> sigma_Y; // Std dev of Y
matrix[N,J] M; // Latent scores for Mediators
}
model {
// Priors
for(j in 1:J){
for (k in 1:K[j]) {
sigma_I[j, k] ~ cauchy(0, 2.5); // Indicator standard deviations
lambda_I[j, k] ~ normal(0, 10); // Indicator factor loadings
}
beta_a[j] ~ normal(0, 10); // Regressions from X to M[j]
beta_b[j] ~ normal(0, 10); // Regressions from M[j] to Y
sigma_M[j] ~ cauchy(0, 2.5); // Standard deviations for M[j]
}
beta_c ~ normal(0, 10); // Regression from X to Y
sigma_Y ~ cauchy(0, 2.5); // Standard deviation for Y
The problem comes with the declaration of the data I, and parameters lambda_I and sigma_I. So I currently have this being declared as matrices where the dimensions are set to the largest number of indicators per mediator. The problem that I have with this is then there are like ghost columns for the smaller number of indicators and I think that that will have issues with respect to Stan utilizing priors on those values and affect the calculations. I feel like I have all of the info that I need to be able to have an array of matrixes of variable size for my I data, but I cannot figure out how to do it. I have the data simulated correctly in R, and my code runs smooth if the number of indicators (variable K) is constant per mediator. Let me know if you would like to see some more of the code.