Unstructured error covariance matrix in a multilevel growth model

  • Operating System: MacOS
  • brms Version: 2.15.0

At the time of this writing, Paul’s list on GitHub issue #403 (see here) indicates brms can fit models with unstructured errors. This is one variant of an array of error structures (e.g., compound symmetry) and Paul has directed people to learn about these capabilities by executing help("autocor-terms"). Although that does lead one to helpful documentation for structures such as compound symmetry (see cosy()) and AR1 (see ar()), it’s not clear, to me, how one would fit a multilevel growth model with an unstructured error matrix.

Use case

In Chapter 7, Singer and Willett (2003) apply a variety of error structures to a growth model. The structural part of their baseline model is:

\begin{align*} \text{opp}_{ij} & = \pi_{0i} + \pi_{1i} \text{time}_{ij} \\ \pi_{0i} & = \gamma_{00} + \gamma_{01} (\text{cog}_i - \overline{\text{cog}}) \\ \pi_{1i} & = \gamma_{10} + \gamma_{11} (\text{cog}_i - \overline{\text{cog}}), \end{align*}

where \text{opp}_{ij} is test score for the i^\text{th} person on the j^\text{th} occasion and \text{cog}_i is the sole time-invariant covariate, which is each person’s baseline level on some measure of cognitive skill. The notation (\text{cog}_i - \overline{\text{cog}}) is meant to indicate \text{cog}_i has been mean-centered. The data were collected over four waves and \text{time}_{ij} is coded in integers 0 through 3.

Baseline model: The conventional multilevel growth model.

Singer and Willett’s first step (pp. 244–246) was to fit a conventional multilevel growth model. Using default priors, we can fit this in brms like so:

# load
library(brms)

# download the data
opposites <- read.table("https://stats.idre.ucla.edu/stat/r/examples/alda/data/opposites_pp.txt", header = TRUE, sep = ",") 

fit.standard <-
  brm(data = opposites, 
      family = gaussian,
      opp ~ 0 + Intercept + time + ccog + time:ccog + (1 + time | id))

If you’re curious, here’s the summary:

print(fit.standard)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: opp ~ 0 + Intercept + time + ccog + time:ccog + (1 + time | id) 
   Data: opposites (Number of observations: 140) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Group-Level Effects: 
~id (Number of levels: 35) 
                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)          36.22      4.92    27.98    46.85 1.00     1255     1829
sd(time)               10.69      1.82     7.48    14.72 1.00     1159     1408
cor(Intercept,time)    -0.44      0.16    -0.71    -0.09 1.00     1520     1888

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept   164.51      6.50   151.65   177.32 1.01      890     1254
time         26.95      2.14    22.78    31.16 1.00     1803     2512
ccog         -0.10      0.53    -1.17     0.97 1.00     1105     1533
time:ccog     0.43      0.17     0.09     0.76 1.00     1563     1954

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma    12.92      1.17    10.93    15.48 1.00     1414     2063

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

The data, if you’re concerned, are downloaded directly from the UCLA institute for digital research and education (see here and perhaps more importantly here).

The model under question: The conditional growth model with an unstructured error covariance matrix.

I’m looking to fit the alternative to this baseline model which contains an unstructured error covariance matrix. Who knows how?

I was asked to further clarify my desired model. If we rewrite the model from above in the compact format AND include the stochastic elements, we have:

\begin{align*} \text{opp}_{ij} & = [\gamma_{00} + \gamma_{10} \text{time}_{ij} + \gamma_{01} (\text{cog}_i - \overline{\text{cog}}) + \gamma_{11} (\text{cog}_i - \overline{\text{cog}}) \times \text{time}_{ij}] \\ & \;\;\; + [\zeta_{0i} + \zeta_{1i} \text{time}_{ij} + \epsilon_{ij}], \end{align*}

where we’ve divided up the structural and stochastic components with brackets. Following Singer and Willett (p. 247), we might think of the terms of the stochastic portion as a composite residual, r_{ij} = [\epsilon_{ij} + \zeta_{0i} + \zeta_{1i} \text{time}_{ij}]. What they then showed in some detail, which I’m going to speed past (though see pages 247 through 250), you can express the entire \mathbf r matrix as

\begin{align*} \mathbf r & \sim \operatorname{N} \begin{pmatrix} \mathbf 0, \begin{bmatrix} \mathbf{\Sigma_r} & \mathbf 0 & \mathbf 0 & \dots & \mathbf 0 \\ \mathbf 0 & \mathbf{\Sigma_r} & \mathbf 0 & \dots & \mathbf 0 \\ \mathbf 0 & \mathbf 0 & \mathbf{\Sigma_r} & \dots & \mathbf 0 \\ \vdots & \vdots & \vdots & \ddots & \mathbf 0 \\ \mathbf 0 & \mathbf 0 & \mathbf 0 & \mathbf 0 & \mathbf{\Sigma_r} \end{bmatrix} \end{pmatrix}, \end{align*}

where each \mathbf{\Sigma_r} element captures the variance/covariance matrix within each person. In this particular data set, there are four time points, which makes \mathbf{\Sigma_r} a 4 \times 4 symmetric matrix. The unconstrained \mathbf{\Sigma_r} matrix I’m looking for is composed of 10 parameters (4 variances and 6 covariances), which you might express as

\begin{align*} \mathbf{\Sigma_r} & = \begin{bmatrix} \sigma_1^2 & \sigma_{12} & \sigma_{13} & \sigma_{14} \\ \sigma_{21} & \sigma_2^2 & \sigma_{23} & \sigma_{24} \\ \sigma_{31} & \sigma_{32} & \sigma_3^2 & \sigma_{34} \\ \sigma_{41} & \sigma_{42} & \sigma_{43} & \sigma_4^2 \end{bmatrix}. \end{align*}
1 Like

Thanks for the clarification. The code below will give you the residual matrix you are interested in, but it will only have fixed intercepts and slopes for ccog at each time point, not group-level (e.g., random) intercepts and slopes. It also does not have the interaction of ccog with time, although it does have the separate slopes at each time point instead (sort of like simple slopes once you decompose the interaction). You need to put the data in wide format first.
My mathematical abilities are too limited to say whether one has to choose between the random effects and the unstructured residual matrix (e.g., they may be representing the same variability in different ways, but I’m not sure). If there is a way to combine them, I don’t know what it is.

data1 ← read.csv(“opposites.csv”)
data2 ← subset(data1, select=-c(wave))
library(tidyr)
data3 ← spread(data2, time, opp)
colnames(data3) ← c(“id”,“cog”,“ccog”, “t0”,“t1”,“t2”,“t3”)
library(brms)
b1 ← brm(cbind(t0, t1, t2, t3) ~ ccog , data = data3, cores=10)
summary(b1)

1 Like

Yep, I can confirm this method produces the desired variance/covariance method, but it loses the desired linear model. Here are the results:

# load 
library(tidyverse)

# download and wrangle the data
data1 <- read.table("https://stats.idre.ucla.edu/stat/r/examples/alda/data/opposites_pp.txt", header = TRUE, sep = ",")
data2 <- subset(data1, select = -c(wave))
data3 <- spread(data2, time, opp)
colnames(data3) <- c("id", "cog", "ccog", "t0", "t1", "t2", "t3")

# fit the model
fit.unstructured <-
  brm(data = data3, 
      family = gaussian,
      bf(mvbind(t0, t1, t2, t3) ~ ccog) + 
        set_rescor(TRUE))
 Family: MV(gaussian, gaussian, gaussian, gaussian) 
  Links: mu = identity; sigma = identity
         mu = identity; sigma = identity
         mu = identity; sigma = identity
         mu = identity; sigma = identity 
Formula: t0 ~ ccog 
         t1 ~ ccog 
         t2 ~ ccog 
         t3 ~ ccog 
   Data: data3 (Number of observations: 35) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Population-Level Effects: 
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
t0_Intercept   164.50      5.99   152.54   176.02 1.00     2626     2525
t1_Intercept   192.15      5.32   181.51   201.90 1.00     2315     2526
t2_Intercept   216.84      5.50   206.03   227.29 1.00     2380     2851
t3_Intercept   246.27      5.66   235.30   257.53 1.00     2787     2737
t0_ccog         -0.01      0.50    -0.99     1.02 1.00     2704     3082
t1_ccog          0.17      0.45    -0.73     1.07 1.00     2314     2921
t2_ccog          0.72      0.47    -0.22     1.63 1.00     2148     2562
t3_ccog          1.25      0.48     0.30     2.16 1.00     2675     2492

Family Specific Parameters: 
         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma_t0    36.26      4.31    29.05    45.71 1.00     3077     2876
sigma_t1    32.46      3.64    26.04    40.25 1.00     2526     2603
sigma_t2    33.66      3.82    26.90    41.97 1.00     2596     2893
sigma_t3    34.47      4.09    27.53    43.35 1.00     3142     2892

Residual Correlations: 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
rescor(t0,t1)     0.74      0.08     0.56     0.86 1.00     2638     3053
rescor(t0,t2)     0.65      0.10     0.43     0.80 1.00     2606     2792
rescor(t1,t2)     0.79      0.07     0.63     0.89 1.00     2898     3058
rescor(t0,t3)     0.33      0.14     0.02     0.58 1.00     2493     2706
rescor(t1,t3)     0.63      0.10     0.40     0.79 1.00     3344     3004
rescor(t2,t3)     0.72      0.08     0.52     0.85 1.00     3499     2813

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Who knows how to fit this?

\begin{align*} \text{opp}_{ij} & \sim \operatorname{Normal}(\mu_{ij}, \mathbf r) \\ \mu_{ij} & = \gamma_{00} + \gamma_{10} \text{time}_{ij} + \gamma_{01} (\text{cog}_i - \overline{\text{cog}}) + \gamma_{11} (\text{cog}_i - \overline{\text{cog}}) \times \text{time}_{ij} \\ \mathbf r & = \begin{bmatrix} \mathbf{\Sigma_r} & \mathbf 0 & \mathbf 0 & \dots & \mathbf 0 \\ \mathbf 0 & \mathbf{\Sigma_r} & \mathbf 0 & \dots & \mathbf 0 \\ \mathbf 0 & \mathbf 0 & \mathbf{\Sigma_r} & \dots & \mathbf 0 \\ \vdots & \vdots & \vdots & \ddots & \mathbf 0 \\ \mathbf 0 & \mathbf 0 & \mathbf 0 & \mathbf 0 & \mathbf{\Sigma_r} \end{bmatrix}, \text{where} \\ \mathbf{\Sigma_r} & = \begin{bmatrix} \sigma_1^2 & \sigma_{12} & \sigma_{13} & \sigma_{14} \\ \sigma_{21} & \sigma_2^2 & \sigma_{23} & \sigma_{24} \\ \sigma_{31} & \sigma_{32} & \sigma_3^2 & \sigma_{34} \\ \sigma_{41} & \sigma_{42} & \sigma_{43} & \sigma_4^2 \end{bmatrix}. \end{align*}

yes, I said it loses the linear model :) I will be interested to see if someone knows how (or can confirm it is not possible) to have both an unstructured random effects matrix (the G matrix in Singer and Willett) and an unstructured residual matrix (the R matrix) at the same time. If nothing else, it’s going to be a pretty unwieldy model needing a lot of data and likely prone to over-fitting.

1 Like

so, I verified it’s possible to have both unstructured G and R matrices in principle by fitting such a model with nlme. I guess now we’re waiting for someone to figure out how to specify the model in brms.

1 Like

For those interested, here’s how the IDRE folks fit the model with nlme: Applied Longitudinal Data Analysis, Chapter 7 | R Textbook Examples

1 Like

I had the same idea as you to specify unstructured covariances via multivariate models. but apparently it doesn’t do the job in call cases. so we need a new structure in brms that operates like ar(), cozy() and friends. please feel free to suggest name and arguments for this function in the github issue.

1 Like

Thanks for your thoughts, @paul. I’ll do that.

that would be fantastic! Thanks for all you do.

My pleasure. Let me tell you how happy it makes me every time I see people helping each other with brms related questions.

1 Like