Hello,

I am relatively new to STAN, but simply loving the model fitting experience. I call stan from R using rstan, and I am interested in fitting an lmer (multilevel model) equivalent hierarchical bayesian model. Since I have multiple outcomes, I plan to fit a hierarchical model with a single latent variable as an outcome, that has four manifest variables corresponding to the four outcomes. I have followed [Merkle and Wang 2016](http://Bayesian latent variable models for the analysis of experimental psychology data) to represent my model as follows:

For each outcome y_1, y_2, y_3, y_4:

y_j ~ normal(mu, sigmaY_j)

where

mu is modeled as

mu = lambda_j * theta

theta is a latent trait which is modeled using hierarchical bayesian model as follows:

theta = intercept + betas’ * covariates + latent_error

I have included a random intercept representation based on Sorensen et al. 2016 's tutorial.

My Stan code is as follows:

// Hierarchical model with:

// Multilevel model structure with level-1 and level-2 Covariates

// Four outcomes that are used to model a single latent variable

// First lambda (factor loading) is fixed to 1, and other loadings are restricted to positive values

// based on domain knowledge and to ensure identification and avoid rotation invariance

data {

int N; //number of data points

int J; //Number of outcomes

matrix[N,J] y; //Outcomes

int ID; //Total number of groups in level-2 structure

int<lower=1,upper=ID> plevel[N]; // Group number

int CovN; // Number of covariates created using model.matrix

matrix[N,CovN] CovX; // Covariates matrix

}

parameters {

vector<lower=0>[J] sigma_error; // SD of outcomes’ distribution

vector[ID] fixed_eff; // Fixed effects vector for latent equation

real<lower=0> sigma_fixed_eff;

vector[J] lambda; // Final factor loading for each outcome

real latent_intercept;

vector[CovN] beta_CovX;

}

model

{

real MuY;

real latent;

// Prior distribution

sigma_fixed_eff ~ cauchy(0,1); // Hyper-prior for fixed effects for each group

fixed_eff ~ normal(0, sigma_fixed_eff);

beta_CovX ~ normal(0,1); // Prior for beta of covariates

sigma_error[1] ~ cauchy(0,1); // Each outcome has its own residual variance

sigma_error[2] ~ cauchy(0,1);

sigma_error[3] ~ cauchy(0,1);

sigma_error[4] ~ cauchy(0,1);

lambda[1] ~ normal(1,0.001); // The factor loading of first outcome is fixed to resolve rotation invariance

lambda[2] ~ normal(1,1); // All outcomes are scaled and centered, and we know that they have a positive relationship with latent variable based on domain knowledge

lambda[3] ~ normal(1,1);

lambda[4] ~ normal(1,1);

for (i in 1:N)

{

latent = latent_intercept + fixed_eff[plevel[i]] + dot_product(beta_CovX, CovX[i,]) ;

for (j in 1:J)

{

MuY = lambda[j] * latent ; // Latent to outcome(s) relationship

y[i,j] ~ normal(MuY, sigma_error[j]); // Each outcome is univariate normal

}

}

}

As shown in code, I skipped the error term for the latent trait parameter to avoid getting the syntax error: Exception: normal_lpdf: Random variable is nan, but must not be nan!

For different number of iterations (500, 1000,…, 5000), I keep getting results where my MCMC chains do not converge and are parallel for many parameters, as seen in figure:

I wonder where I am going wrong in my code above. If I convert my multivariate outcome to a univariate outcome and represent the same equation using the below code, I do get a model that good convergence, but I am not sure if this heuristic approach is correct. The trick below is to copy the input matrix four times, and have the outcomes serially in a single vector (so number of observations is 4x, but the number of outcomes is 1). One reference where this trick is used is for bivariate outcomes in Morrison et al. 2016

model

{

real MuY;

real latent;

// Prior distribution

sigma_fixed_eff ~ cauchy(0,1); // Hyper-prior for fixed effects for each group

fixed_eff ~ normal(0, sigma_fixed_eff);

beta_CovX ~ normal(0,1); // Prior for beta of covariates

sigma_error[1] ~ cauchy(0,1); // Each outcome has its own residual variance

sigma_error[2] ~ cauchy(0,1);

sigma_error[3] ~ cauchy(0,1);

sigma_error[4] ~ cauchy(0,1);

lambda[1] ~ normal(1,0.001); // The factor loading of first outcome is fixed to resolve rotation invariance

lambda[2] ~ normal(1,1); // All outcomes are scaled and centered, and we know that they have a positive relationship with latent variable based on domain knowledge

lambda[3] ~ normal(1,1);

lambda[4] ~ normal(1,1);

for (i in 1:(N/4))

{

latent = latent_intercept + fixed_eff[plevel[i]] + dot_product(beta_CovX, CovX[i,]) ;

y[i] ~ normal(lambda[1]*latent, sigma_error[1]);

}

for (i in ((N/4) + 1) :(N/2))

{

latent = latent_intercept + fixed_eff[plevel[i]] + dot_product(beta_CovX, CovX[i,]) ;

y[i] ~ normal(lambda[2]*latent, sigma_error[2]);

}

for (i in ((N/2) + 1) :(3*N/4))

{

latent = latent_intercept + fixed_eff[plevel[i]] + dot_product(beta_CovX, CovX[i,]) ;

y[i] ~ normal(lambda[3]*latent, sigma_error[3]);

}

for (i in ((3*N/4) + 1) :N)

{

latent = latent_intercept + fixed_eff[plevel[i]] + dot_product(beta_CovX, CovX[i,]) ;

y[i] ~ normal(lambda[4]*latent, sigma_error[4]);

}

}

Looking forward to your advise as to whether the latter approach has a major flaw in logic.

Thanks,

Karthik S