# Hierarchical Models for Longitudinal Bayesian SEM / Multidimensional IRT with Latent Regression

Looking for some modeling advice in the context of longitudinal Bayesian structural equation modeling with correlated latent outcomes when T = 2 (pre/post design). Most related posts seem to address cross-sectional measurement models, single-factor longitudinal measurement models, or large-T longitudinal SEMs, so this is a bit different.

# Background

My data come from the responses of i \in \{1,...,1000\} respondents to the same 8 items \{x1,...x8\} recorded before/at baseline (t = 0) and after (t = 1) respondents randomly received either a low (z = 0) or high (z = 1) intensity treatment. The items measure two correlated latent factors, with four items apiece measuring each factor: F = \{F1, F2\}. For convenience, anything with a _1 subscript (e.g. x1_1, F2_1, F_1) refers to post-treatment values while the _0 subscript refers to pre-treatment values.

I am trying to estimate the effect of receiving the high vs. low intensity treatment on the post-treatment latent variables: E[F_1|z = 1] - E[F_1|z = 0]. Since I have pre-treatment responses to the same questions, I am also conditioning on the lagged scores of both latent factors (F_0) to likely improve precision. I prefer this to modeling change scores for efficiency reasons. McArdle (2009) calls this the (two-occasion) “multiple common factors cross lagged regression model.”

The associational structure I’m proposing is shown in the path diagram below. Note that (a) residual variances for each observed measure x are uncorrelated across time and there are no equality constraints on parameters because the pre-treatment parameters are treated as nuisance parameters (I do not care if e.g. F1_0 and F1_1 are “actually” measuring the same thing provided that F_0 predict F_1).

# Modeling Issues

1. What should be hierarchical? What are the relevant “groupings” of the responses x? The typical approach to employing multilevel models here would be in an item-response framework where I model the responses themselves x as function of person and item-specific variables. For example, brms syntax might be: response ~ (1 | item ) + (0 + latent_factor_identifier | person), which would return a vector of length 4, where each element is a person-specific value for one of the 4 latent variables in the model. However, this seems to ignore that these 4 values are not independent: half of the responses used to measure them are pre-treatment and half are post, half measure factor F1 and half measure F2, and roughly half of the respondents had z =1 and half z=0. This last point is particularly troublesome because everyone is treated to varying degrees, thus I expect F_0 \neq F_1 (even if they were measuring the same thing) and I would need to change z from a binary (“high”, “low” treatment intensity) variable to a nominal (“high”, “low”, “pre-treatment”) variable to partition all x responses according to it. As it is now, it would be odd (or impossible) to model pre-treatment factor scores (“ability” in IRT-speak) as functions of treatment status…Accordingly, I am not sure what type of hierarchical structure one would specify in this context: person:latent_factor_identifier? person:z? person:time? person:time:z?

2. Are my groupings too small? Nearly every way of defining “groupings” has relatively few unique values of the grouping variable at some point. For example, z takes on two values, latent_factor_identifier would have 4 unique values, time has 2 values (pre/post), an identifier for which type of latent factor would have 2 value (i.e. F1 vs. F2). I know one can identify a model with two values for a grouping variable, but isn’t this more likely to introduce computational issues with very little regularization or gain in precision?

Many thanks.

# Referenced

McCardle, John J. (2009). Latent Variable Modeling of Differences and Changes with Longitudinal Data. Annual Review of Psychology. Latent Variable Modeling of Differences and Changes with Longitudinal Data | Annual Review of Psychology

I don’t think I can solve all your questions, but here are some things that might be relevant:

I am not sure that brms can currently handle the directed arrows from the “_0” latent variables to the “_1” latent variables. There is a good deal of discussion about this here. So part of your problem with the syntax might be that the syntax doesn’t exist.

For your question 2, I’m not sure that the small grouping variables that you mention matter. It seems like “person” and perhaps “item” would be the places where you want many units.

In general, I would start with a simpler model (say, a 4-factor model), get the simple model working well, then think about the extensions. You might find that an SEM approach is easier than a multilevel approach here, esp because the McArdle paper that you mention uses an SEM approach.