Cross-Lagged Multilevel Panel Model

I’m referring to the paper “How to compare cross-lagged associations in a multilevel autoregressive model.” from Schuurman et al. (http://dx.doi.org/10.1037/met0000062). The authors show how they use Bayesian multivariate response multilevel models (using Bugs, code available here: http://supp.apa.org/psycarticles/supplemental/met0000062/met0000062_supp.html) to fit a cross-lagged panel model in a multilevel design.

My question is, how this would translate into brms, and if this is currently possible? I have no such comprehensive statistical background to understand the formulas in detail. What I find confusing is that the authors state at p. 208 that “The model as defined in Equations 1–3 forms level one of the multilevel model.” - which seems to me that level 1 includes all observations. However, some sentences later they say that “…whereas the model defined at level one is identical to an n=1 autoregressive time series model”. So is level 1 defined as a n=1 model, or does it include all observations, or am I confusing something here?

Let’s say, we have 3 time points and x and y were measured at each time point, while we have covariates that are constant accross all time points. From what I understood, the model would be written in brms like this:

Edited Code Example, this addresses Paul’s first comment

f1a <- bf(x2 ~ x1 + y1 + covariate + (1 |id1|subject))
f1b <- bf(y2 ~ x1 + y1 + covariate + (1 |id1|subject))
f2a <- bf(x3 ~ x2 + y2 + covariate + (1 |id1|subject))
f2b <- bf(y3 ~ x2 + y2 + covariate + (1 |id1|subject))

# maybe also set_rescor(TRUE)?
model <- brm(f1a + f1b + f2a + f2b + set_rescor(FALSE), data = data)

Is this correct? Would this describe a cross-lagged panel model with random intercept structure?

From quickly looking at the paper, I have two suggestions.

  1. You should use x1 and y1 as predictors of x2 and y2 (not the other way round). Same for the 2nd time point.
  2. You should set rescor to TRUE.

True, this is what I always do in SEM/lavaan - no idea, why I switched it in my notation above. Would you say that the above model represents a cross-lagged multilevel panel model? I’m not sure in which way the paper’s authors consider the n=1 autoregressive model… But maybe that’s already covered by using subject as random intercept (and more complex nested strucures would be written as nested random intercepts then…).

I do think it represents a cross-lagged panel model. From my understanding, the autoregressive part is just that y3 is regressed on y2 which is in turn regressed on y1 (same for x). But I haven’t read the paper thoroughly enough to be sure they mean exactly that.

Ok, thanks! I consider this question as solved, but probably do not tag the thread als “solved”, so others may get in into this discussion as well.

Just running one of these models myself trying to replicate Lavaan findings in BRMS.

Typically in a CLPM you model covariance in outcomes and predictors. Setting set_rescor(TRUE) addresses the former but do we need to do anything to capture the latter ?

CLPMs assume equal spacing between waves, but that isn’t always the case and I suspect that it isn’t necessary in a multilevel setting. Is it possible to extend the syntax to account for unequal spacings between waves? Maybe by including time as a covariate?

If you have covariates, you could include:

bf(x1 ~ covariate + (1 |id1|subject))
bf(y1 ~ covariate + (1 |id1|subject))

Giving:

f1a <- bf(x1 ~ covariate + (1 |id1|subject))
f1b <- bf(y1 ~ covariate + (1 |id1|subject))
f2a <- bf(x2 ~ x1 + y1 + covariate + (1 |id1|subject))
f2b <- bf(y2 ~ x1 + y1 + covariate + (1 |id1|subject))
f3a <- bf(x3 ~ x2 + y2 + covariate + (1 |id1|subject))
f3b <- bf(y3 ~ x2 + y2 + covariate + (1 |id1|subject))

model <- brm(f1a + f1b + f2a + f2b + f3a + f3b + set_rescor(TRUE), data = data)

If I’m not mistaken, this should take care of the correlations for the initial predictors. The rest of the predictors are outcomes too so should be accounted for.

1 Like

I guess we could indeed “weight” the autoregressive predictors according their distance or let the model find out itself (using an interaction or something similar) how the autoregressive effects change of distance between time points.

1 Like

So normally, I think, one would fit a model like this using the “long” data in which one row represents one person at one time (so there are subjects * times rows in total).

You could generate the lagged variables prior to sending to brms, like this

library(dplyr)
long_data %<>%
  group_by(id) %>%
  arrange(time, .by_group = TRUE) %>%
  mutate(
     y_lag = lag(y),
     x_lag = lag(x)
  )

Assuming the long data had columns id, time, x, y, and covariate to begin with.

So in this case, we could adapt to this

f1 <- bf(y ~ y_lag + x_lag + covariates + (y_lag | id) 
f2 <- bf(x ~ x_lag + y_lag + covariates + (x_lag | id)

model <- brm(f1 + f2 + set_rescor(TRUE), data = long_data)

Is that right? Note that I add a random slope for the autoregressive parameter because that’s what that research group normally does and it comes closest to have a time series model at the subject level while still applying shrinkage to those subject-level estimates.

Yes, this looks reasonable to me. What to do with the covariate exactly depends on the research question I guess, but for a main effect of a covariate that does not vary across time within the same person, your approach looks good.

Hey I was working on this a while ago, just got my revise and resubmit last week actually. Just to add some of the challenges I ran into as follows.

Typically with these models you want to model the residual correlations on the predictors and the residuals of their lagged effects.

You might try something as follows to get around this…

brm(
    bf(y_lag ~ 1) +
      bf(x_lag ~ 1) +
      bf(y ~ y_lag + x_lag + covariates + (y_lag | id) +
      bf(x ~ x_lag + y_lag + covariates + (x_lag | id) +
      set_rescor(TRUE),
    data = long_data
)

The problem here is that you then model residual correlations of y_lag by y, etc. which you might not want. To my knowledge there is no way to choose which residual correlations you want to model in brms.

I drew this in a state of the art SEM plotter, MS Paint. The red arrow is the bit I think we would struggle with but everything else is quite easily achievable.

image

Hopefully this is of some interest to you, gotta start paying back the help I get on here !

1 Like

@paul.buerkner, in line with your comments here, could we fit a model with correlated residuals without the problem @haututu mentions as follows?

brm(
     bf(y_lag ~ 1 + (1 |a| obs)) +
      bf(x_lag ~ 1 + (1 |a| obs)) +
      bf(y ~ y_lag + x_lag + covariates + (y_lag | id) + (1 |b| obs)) +
      bf(x ~ x_lag + y_lag + covariates + (x_lag | id) + (1 |b| obs)) +
      set_rescor(FALSE),
    data = long_data
)

Where obs is the observation/row number?

@jackbailey the problem with this approach when using normal response distributions is that we have two residual SDs battling for the same information, first the distributional parameter sigma and second the SD of obs. This will likely yield a badly sampling model.

Adding to what Paul said - I was suspicious that you could get this to converge and alas, I tried my best but couldnt get a decent enough sampling model.

Oh well. I guess we’ll just have to wait for brms 3.0 ;)

Is there any recommendation on this at the moment.

I would like to see if I can run a bivariate random-intercept clpm model in brms, with the additional twist of adding an extra random intercept to the Y variable.

I’m not sure if this is still relevant but I just came across this question so I figured I’d try to answer.

Adding another random intercept shouldn’t be a problem since brms allows you to add nested and crossed random effects. (1 | fac1) + (1|fac2) is for crossed and (1 | fac1/fac2) is nested.

But I think the model in your diagram is a little bit different from the rest of the models in this thread, and I’m not sure you can fit in in brms.

The random intercept cross-lagged panel model in your diagram breaks up each observation of x and y into within-subject parts (w_{y,i,t} and w_{x,i,t}) and between-subject parts (\mu_{y,i} and \mu_{x,i}) and uses the within-subject parts to predict each other. In other words it asks "does a participant’s deviation from their average value of x at time t-1 predict their deviation from their average value of y at time t?This corresponds to the equations below, where i is for participant, t is for time, \mu is for the random intercepts, and w is for the the within subject parts (the Ws in your diagram):

\displaylines{y_{i,t} = \beta_{y, 0} + \mu_{y, i} + w_{y,i, t}\\ w_{y,i, t} = \beta_{y, 1}*w_{y, i, t-1} + \beta_{y,2}*w_{x, i, t-1} + \epsilon_{y, i, t}\\ x_{i,t} = \beta_{x,0} + \mu_{x, i} + w_{x,i, t} \\ w_{x, i, t} = \beta_{x,1}*w_{x,i, t-1} + \beta_{x,2}*w_{x,i, t-1} + \epsilon_{x, i, t} }

See also equations 3a to 3d in A critique of the cross-lagged panel model - PubMed

I might be wrong about the syntax, but it looks to me like those brms models described above predict time-point specific deviations of x and y from the raw scores for x and y at time t-1, meaning that they break up the outcomes for each sub-model into between- and within-subject parts, but not the predictors. In other words, I think those models are asking “does a participant’s value of x at time t-1 predict their deviation from their average value of y at time t?” In practice, I don’t know if this will make much of a difference, but I’m fairly sure these are different models.

I don’t know of an easy way to fit a model like the one in that diagram in brms, but I think it should be pretty straightforward in blavaan.